Storage system and data processing method
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- ALIBABA (CHINA) CO LTD
- Filing Date
- 2022-08-24
- Publication Date
- 2026-06-19
AI Technical Summary
Existing distributed storage systems suffer from write amplification and increased garbage data during garbage collection, impacting system performance and cost.
A storage system is provided in which a client receives a marking operation instruction, generates a data operation request, and the target data node performs marking processing, thereby reducing write amplification during garbage collection and reducing garbage data.
It effectively reduces write amplification caused by garbage collection, reduces garbage data, improves system performance, and reduces storage costs.
Smart Images

Figure CN115509440B_ABST
Abstract
Description
Technical Field
[0001] This specification relates to the field of information storage technology, and in particular to a storage system and a data processing method. One or more embodiments of this specification also relate to a data processing apparatus, a computing device, and a computer-readable storage medium. Background Technology
[0002] With the continuous development of internet technology, distributed storage systems are now widely used to achieve reliable massive data storage. Distributed storage systems extensively employ a three-replica mechanism, such as key-value distributed storage systems, to store data in different locations to improve data storage reliability. Current distributed storage systems consist of a three-layer architecture: a bottom-layer distributed file system, a middle-layer key-value storage engine, and an upper-layer computing engine layer. In this structure, to ensure data reliability, data deletion and modification are performed using an append-only approach. Therefore, in multi-layered distributed storage systems, both the middle key-value storage engine layer and the bottom single-machine storage engine layer use this method to process data. Subsequent memory release requires garbage collection at each layer, leading to greater write amplification, increased garbage data, and ultimately impacting overall system performance.
[0003] Therefore, how to reduce write amplification, reduce garbage data, lower storage costs, and improve system performance during garbage collection in distributed storage systems are urgent problems that need to be solved. Summary of the Invention
[0004] In view of this, embodiments of this specification provide a storage system that offers Trim capability, which can reduce write amplification caused by garbage collection, improve overall system performance, reduce garbage data, and lower storage costs. One or more embodiments of this specification also relate to data processing methods, data processing apparatuses, a computing device, a computer-readable storage medium, and a computer program to address technical deficiencies in the prior art.
[0005] According to a first aspect of the embodiments of this specification, a storage system is provided, the system including a client and a data node; the client is configured to receive a marking operation instruction for a data file, parse the marking operation instruction to obtain marking operation information, generate a data operation request for a corresponding target data node based on the marking operation information, and send the data operation request to the target data node; the target data node is configured to receive the data operation request, and mark the data range to be operated on in the local storage space associated with the data file according to the data operation request.
[0006] According to a second aspect of the embodiments of this specification, a data processing method is provided, applied to a client in a storage system, comprising:
[0007] Receive marking operation instructions for data files;
[0008] Parse the marking operation instructions to obtain marking operation information;
[0009] Generate a data operation request for the corresponding target data node based on the marked operation information, and send the data operation request to the target data node.
[0010] According to a third aspect of the embodiments of this specification, a data processing method is provided, applied to a target data node in a storage system, comprising:
[0011] Receive requests for data operations on data files;
[0012] Based on the data operation request, the data range to be operated on in the local storage space associated with the data file is marked.
[0013] According to a fourth aspect of the embodiments of this specification, a data processing apparatus is provided, applied to a client in a storage system, comprising:
[0014] The receiving module is configured to receive marking operation instructions for data files;
[0015] The parsing module is configured to parse the marking operation instructions to obtain marking operation information;
[0016] The sending module is configured to generate a data operation request for the corresponding target data node based on the marked operation information, and send the data operation request to the target data node.
[0017] According to a fifth aspect of the embodiments of this specification, a data processing apparatus is provided, applied to a target data node in a storage system, comprising:
[0018] The receiving module is configured to receive requests for data operations on data files;
[0019] The marking module is configured to mark the data range to be operated on in the local storage space that is associated with the data file, according to the data operation request.
[0020] According to a sixth aspect of the embodiments of this specification, a computing device is provided, including a memory, a processor, and computer instructions stored in the memory and executable on the processor, wherein the processor executes the computer instructions to implement the steps of the data processing method.
[0021] According to a seventh aspect of an embodiment of this specification, a computer-readable storage medium is provided that stores computer instructions which, when executed by a processor, implement the steps of the data processing method.
[0022] According to an eighth aspect of the embodiments of this specification, a computer program is provided, wherein when the computer program is executed in a computer, it causes the computer to perform the steps of the above-described data processing method.
[0023] The storage system provided in this specification includes a client and data nodes. The client is configured to receive marking operation instructions for data files, parse the marking operation instructions to obtain marking operation information, generate a data operation request for a corresponding target data node based on the marking operation information, and send the data operation request to the target data node. The target data node is configured to receive the data operation request and, based on the data operation request, mark the data range to be operated on in the local storage space associated with the data file.
[0024] One embodiment of this specification implements a method to convert marking operation instructions for data files into data operation requests for data ranges of data nodes through a client. Unnecessary data stored in data nodes is marked, so that invalid data does not need to be processed again during subsequent garbage collection. This effectively reduces write amplification caused by garbage collection, reduces garbage data, and also allows new data to be written directly to the data range to be operated on when writing new data to data nodes, improving write efficiency, optimizing storage data nodes, and reducing storage costs. Attached Figure Description
[0025] Figure 1 This is a schematic diagram illustrating a scenario in which a storage system performs a marking operation, as provided in one embodiment of this specification.
[0026] Figure 2a This is a schematic diagram of the architecture of a storage system provided in one embodiment of this specification;
[0027] Figure 2b This is a schematic diagram of the structure of a storage system provided in one embodiment of this specification;
[0028] Figure 3 This is a flowchart illustrating a data processing method for a client in a storage system, as provided in one embodiment of this specification.
[0029] Figure 4 This is a flowchart illustrating a data processing method for a target data node in a storage system, provided in one embodiment of this specification.
[0030] Figure 5This is a schematic diagram illustrating the interaction between a client and a target data node in a storage system, provided by one embodiment of this specification.
[0031] Figure 6 This is a schematic diagram of the structure of a data processing device for a client in a storage system, provided in one embodiment of this specification.
[0032] Figure 7 This is a schematic diagram of the structure of a data processing device applied to a target data node in a storage system, provided in one embodiment of this specification.
[0033] Figure 8 This is a structural block diagram of a computing device provided in one embodiment of this specification. Detailed Implementation
[0034] Many specific details are set forth in the following description to provide a full understanding of this specification. However, this specification can be implemented in many other ways than those described herein, and those skilled in the art can make similar extensions without departing from the spirit of this specification. Therefore, this specification is not limited to the specific implementations disclosed below.
[0035] The terminology used in one or more embodiments of this specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of this specification. The singular forms “a,” “described,” and “the” as used in one or more embodiments of this specification and the appended claims are also intended to include the plural forms unless the context clearly indicates otherwise. It should also be understood that the term “and / or” as used in one or more embodiments of this specification refers to any or all possible combinations including one or more of the associated listed items.
[0036] It should be understood that although the terms first, second, etc., may be used to describe various information in one or more embodiments of this specification, such information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first may also be referred to as second without departing from the scope of one or more embodiments of this specification, and similarly, second may also be referred to as first. Depending on the context, the word "if" as used herein may be interpreted as "when," "when," or "in response to a determination."
[0037] First, the terms and concepts used in one or more embodiments of this specification will be explained.
[0038] Append-only: Only supports append-only writes; does not support random modification writes.
[0039] File: A contiguous block of stored data uniquely identified by a filename or file ID. Each file consists of one or more chunks.
[0040] Chunk: A Chunk is a continuous range of data stored in a file. Each Chunk can be a multi-copy Chunk or an erasure-coded Chunk.
[0041] Replica: A replica is a contiguous block of data stored in a chunk on a single-machine storage engine. A multi-replica chunk is identical to every other replica; an erasure-coded chunk stores data shards that satisfy erasure coding slicing and arrangement rules, and each shard contains different data.
[0042] KV storage engine: refers to an engine built on top of a storage system that stores data in a key-value manner and provides access to data in a key manner.
[0043] ZNS-SSD (Zoned Namespace) - (Solid State Disk) is developed based on SSD. It realizes the migration of FTL (Flash Translation Layer) from inside the SSD to the upper host layer, opening up the inside of the SSD to the host, so that users can flexibly have their own specific FTL according to their needs.
[0044] Currently, a three-tier architecture is widely used in various distributed storage systems such as object storage, file storage, table storage, and block storage, as well as database systems such as distributed analytical databases, distributed transactional databases, and hybrid analytical-transactional databases. The bottom layer uses an append-only distributed file system for highly reliable and available persistent storage; the middle layer uses various key-value (KV) storage engines as row-oriented, column-oriented, or hybrid engines; and the upper layer is the front-end access layer or computation engine layer. In this architecture, data deletion, modification, and updates are all implemented by writing logs in an append-only manner to fully utilize the access characteristics of the storage medium and improve write throughput. Then, a background garbage collection (GC) process releases the physical space of overwritten or deleted data. Therefore, in this architecture, KV layer GC optimization is a key technology and a direction for continuous optimization and breakthroughs by relevant technical personnel.
[0045] As more and more underlying distributed file systems support append-only storage media such as ZNS SSDs / SMR HDDs, the single-machine storage layer of the distributed file system also adopts a key-value (KV) storage structure. Data deletion, modification, and updates are implemented by writing logs to append-only, relying on background garbage collection (GC) to truly release the physical space of overwritten or deleted data. Therefore, the independent GC approach of the intermediate KV-type storage engine layer and the underlying distributed file system layer, lacking coordination, will lead to greater write amplification, increased garbage data, and thus affect the overall system performance, cost, and reduce the system's competitiveness.
[0046] Based on this, this specification provides a storage system that offers Trim capability, which reduces write amplification caused by garbage collection, improves overall system performance, reduces garbage data, and lowers storage costs. Two data processing methods are also provided. This specification also relates to two data processing devices, a computing device, a computer-readable storage medium, and a computer program, which will be described in detail in the following embodiments.
[0047] Figure 1 This diagram illustrates a scenario in which a storage system performs a marking operation according to an embodiment of this specification. In this scenario, after receiving a marking operation instruction for a data file, the client in the storage system converts the marking operation information carried in the marking operation instruction into logical address information. The logical address information is the specific location information of the data file on the data block in the storage node. Subsequently, the data node can further map the physical address information corresponding to the data file, i.e., the specific location of the data on the disk, based on the logical address information.
[0048] See Figure 1 After the client receives a marking operation instruction for a data file, i.e., a Trim instruction for the data file, the marking operation instruction carries marking operation information, including key information such as the file's unique identifier (fileld), the file offset (Offset), and the Trim length (Length). The client will convert the marking operation instruction into a data block-level data operation request in the target data node and send it to the target data node. The target data node will then perform the actual Trim operation based on the operation description information carried in the data operation request, thereby completing the marking process of the data file. During subsequent garbage collection, the data node can directly clean up the marked data, reducing write amplification caused by garbage collection and improving processing efficiency.
[0049] Figure 2aThis diagram illustrates the architecture of a storage system according to an embodiment of this specification. The architecture includes a metadata service, a client, and a single-machine storage engine. Metadata is system data used to describe the file system and file characteristics, such as file type, file size, access permissions, and data index information. Before accessing file data, users need to access the file's metadata to obtain basic file attribute information and data index information. The client is responsible for sending read / write requests to the single-machine storage engine. If three copies of a data file need to be written, the request is first sent to the client, which then forwards it to the single-machine storage engine. The single-machine storage engine, also known as a data node or data server, stores file data, ensuring data availability and integrity. By distributing data across multiple single-machine storage engines, the storage system allows for simultaneous scaling of performance and capacity, resulting in strong system scalability.
[0050] Figure 2b A schematic diagram of a storage system 200 according to an embodiment of this specification is shown. The storage system 200 includes a client 210 and a data node 220. The following further describes the marking operation of the storage system performing the trim operation.
[0051] The client 210 is configured to receive a marking operation instruction for a data file, parse the marking operation instruction to obtain marking operation information, generate a data operation request for the corresponding target data node based on the marking operation information, and send the data operation request to the target data node.
[0052] The target data node 220 is used to receive the data operation request and mark the data range to be operated on in the local storage space associated with the data file according to the data operation request.
[0053] In this context, data files can be understood as data files no longer needed by the project. When the intermediate KV-type storage engine layer needs to delete invalid data in the project, it can call the TRIM interface provided by the storage system to mark invalid data stored on the data nodes in the storage system. During subsequent garbage collection, invalid data can be directly overwritten based on the marking information, or the marked invalid data can be deleted, facilitating the writing of new data and improving writing efficiency. The marking operation instruction can be understood as an instruction to call the storage system's TRIM interface to execute the TRIM operation. Specifically, the client 210 includes a marking operation interface. The client 210 is used to call the marking operation interface to parse the marking operation instruction and obtain the marking operation information, wherein the marking operation information includes file identification information, file offset information, and file marking length information.
[0054] In practical applications, the storage system can be a distributed file storage system. After receiving the marking operation instruction, the client in the distributed file storage system calls the marking operation interface to parse the marking operation instruction. The marking operation interface is the interface that provides the trim capability, and subsequent trim operations are performed based on the trim interface. The `trim` interface is called to parse the marking operation information carried in the marking operation instruction. This information includes key details such as file identifier (fileId), file offset (offset within the file), and the length of the `trim` command. This marking information can be understood as the logical address information of the data file. To enable the data node to obtain the specific location of the data file in the current storage space, the marking operation information needs to be converted into a data operation request that the target data node can handle. The target data node can map the physical address information of the disk where the data file resides based on the logical address information carried in the data operation request. When the data associated with the data file is located in this data node, that data node is the target data node for this `trim` operation. Each target data node corresponds to its own data operation request, which contains the offset and length of the data file's data within the target data node. Based on the data operation request, the data range to be processed by the `trim` operation can be determined. This data range can be understood as the data range to be deleted, and the marked data within this range can then be deleted.
[0055] Among all data nodes, the target data node 220 is the one that needs to be trimmed in this operation. The target data node will receive the corresponding data operation request. The data node is the single-machine storage engine in the storage system. The data operation request includes the Chunk-level ChunkId, ChunkOffset, and Length managed by the single-machine storage engine. The target data node further maps the physical address information on the physical hard disk where the data file is located according to the data operation request, and performs the actual trim operation, that is, marks the data range to be operated on in the local storage space that manages the data file.
[0056] In practical applications, the independent garbage collection methods of the intermediate KV engine layer and the underlying distributed file system, lacking coordinated processing, can lead to greater write amplification and an increase in garbage data. In the embodiments described in this specification, when the intermediate KV engine layer performs a trim operation on invalid data, it calls the underlying distributed file system to perform the trim operation in parallel. This causes data nodes in the distributed file system to undergo trimming, allowing subsequent garbage collection to directly overwrite the trimmed data, improving data release efficiency.
[0057] In one embodiment of this specification, the storage system includes a client and 10 data nodes. The client receives a marking operation instruction for a data file, which is the user data file of inactive user A. The marking operation instruction is an instruction to mark the user data file of inactive user A, i.e., a trim instruction. The client parses the marking operation instruction to obtain marking operation information, which includes the file identifier "A" of the user data file, the offset within the file, and the length of the trim instruction. Based on the marking operation information, the client determines that user A's user data file is stored in data node 1 and data node 2, respectively. Then, it generates a data operation request 1 for target data node 1 and a data operation request 2 for target data node 2. Data operation request 1 includes the specific location of a portion of the data file in the data block of target data node 1, and data operation request 2 includes the specific location of a portion of the data file in the data block of target data node 2. The data operation request corresponding to each target data node is sent to the corresponding target data node, i.e., data operation request 1 is sent to target data node 1, and data operation request 2 is sent to target data node 2.
[0058] The client converts file-level marking operation information of data files into chunk-level data operation requests for data nodes. This allows subsequent target data nodes to determine the data range to be operated on based on the data operation requests, and then perform marking processing.
[0059] In one embodiment of this specification, following the previous example, target data node 1 receives data operation request 1, which carries logical address information. Based on the logical address information, the physical address information is determined to be offset = 1MB + 1KB and length = 1MB + 1KB. Therefore, the data range to be operated on is determined in target data node 1 as <1MB, 1MB + 1KB>, and target data node 1 marks this data range. Target data node 2 receives data operation request 2, which carries logical address information. Based on the logical address information, the physical address information is determined to be offset = 2MB + 1KB and length = 2MB + 1KB. Therefore, the data range to be operated on is determined in target data node 2 as <2MB, 2MB + 1KB>, and target data node 2 marks this data range.
[0060] The storage system with trim capability provided in the embodiments of this specification allows the intermediate KV storage engine to utilize the storage system's trim capability to eliminate GC traffic through file rewriting, significantly reduce write amplification caused by GC, improve overall system performance, reduce garbage data, lower data storage costs, and thus enhance system competitiveness.
[0061] Specifically, the client generates a data operation request for the corresponding target data node based on the marking operation information. This can be achieved by first determining the logical address information based on the marking operation instructions, and then generating the data operation request for the target data node. The client is configured to determine the logical address information corresponding to the data file based on the marking operation information, and then generate the corresponding data operation request for the target data node based on the logical address information.
[0062] In practical applications, since the marking operation information is relative to the data file, and the data file is stored in different data blocks in different data nodes, when a trim operation is needed on the data file, the logical address of the data file in the data block must first be determined. Then, based on the logical address, the data nodes corresponding to the data block are found. For example, the marking operation instruction includes marking operation information. The client determines the corresponding logical address information as ChunkId, ChunkOffset = 1MB, and Length = 1MB in target data node 1 based on the marking operation information input by the user: fileId, fileOffset, and Length. Target data node 1 includes multiple data blocks, that is, the data nodes containing multiple data blocks are determined as target data nodes. The data range of each data block is different, such as <0, 0>, <1MB, 1MB+1KB>, and <2MB, 2MB+1KB>. Thus, the client can generate a data operation request corresponding to target data node 1. Subsequently, data node 1 can further determine the physical address information of the data file on the hard drive based on the logical address information carried in the data operation request and perform the actual trim operation.
[0063] In practice, after receiving a data operation request, the target data node 1 can determine the operation description information (i.e., physical address information) based on the logical address information carried in the data operation request, and determine the data range to be operated on, thereby performing the subsequent trim operation. Specifically, the target data node is used to determine the operation description information based on the logical address information carried in the data operation request, determine the data range to be operated associated with the data file in the local storage space based on the operation description information, modify and save the attribute information of the data range to be operated on as a marking process for the data range to be operated on.
[0064] The operation description information can be understood as the physical address information of the data file in the target data node. Referring to the example above, the logical address information is ChunkId, ChunkOffset = 1MB, and Length = 1MB. This is further mapped to the physical address information: Offset = 1MB + 1KB, Length = 1MB + 1KB. Thus, the interval corresponding to the data block <1MB, 1MB + 1KB> is determined as the data interval to be operated on among multiple data blocks in target data node 1, and this data interval is marked. This facilitates the subsequent garbage collection (GC) process of the data node, allowing direct release of storage space from the marked data interval.
[0065] In practical applications, each data range includes corresponding range header data, which contains the attribute information of the corresponding data range, including corresponding marking information. Marking a data range can be understood as modifying the attribute information of the data range, changing the marking information in the attribute information to marked and saving it. Thus, if the marking information in the attribute information is identified as marked, it can be determined that the data range has been trimmed, and the data written to that data range can be overwritten.
[0066] When performing a trim operation on a data node, there may be cases where the operation fails. In such cases, the client can continue to perform the corresponding operation based on the storage status information. Specifically, the client is used to query the storage status information of the marked operation instruction and perform the operation corresponding to the storage status information when it is determined that the operation feedback information corresponding to the data operation request is an operation failure.
[0067] The operation feedback information can be understood as the feedback information received by the client after the trim operation is performed. Based on this feedback information, the client can determine whether the marking operation was successful. If the operation fails, the client performs the corresponding operation based on the storage status information. In practical applications, the trim operation is considered successful only after all data blocks of the associated data file have successfully undergone the marking operation.
[0068] In specific implementation, the storage state information is divided into two types: persistent storage state and non-persistent storage state. In the persistent storage state, the client can perform a trim operation to persist the data operation request, which means caching the data operation request corresponding to this trim operation and repeatedly executing the step of sending the data operation request to the target data node until the trim operation is successfully executed. In the non-persistent storage state, when the trim operation fails, the client will report the failure information to the intermediate KV-type storage engine layer. The intermediate KV-type storage engine layer records the failure of this trim operation, allowing it to resend the marking processing instruction to the client for this trim operation until the trim operation is successfully executed. Specifically, the client is used, when the storage state information is in the persistent storage state, to execute the step of sending the data operation request to the target data node based on the persistent data operation request.
[0069] When the storage state information is in a non-persistent storage state, the processing result information corresponding to the marking operation instruction is generated and sent to the index system corresponding to the storage system, wherein the index system is used to resend the marking operation instruction to the storage system.
[0070] In practical applications, when the storage state is persistent, the storage system client can persist data operation requests, that is, save the data operation requests locally for easy failover processing. For example, in a three-replica scenario, some data nodes may successfully perform a trim operation, but the client crashes; or some data nodes may crash, preventing immediate trim operation execution. By persisting the data operation requests, when the client restarts, it can resend the data operation requests to the target data node, which will then perform the corresponding trim operation; or, when the crashed data node restarts, the client can resend the corresponding data operation requests to that data node, causing it to perform the trim operation, thus ensuring the normal execution of the trim operation.
[0071] In one embodiment of this specification, after receiving a marking operation instruction, the client generates a data operation request for the corresponding target data node according to the marking operation instruction and caches the data operation request. After the client crashes and restarts, the client reads the unexecuted data operation request from the cached data and sends the data operation request to the corresponding target data node, so that the data node can execute the corresponding data operation normally.
[0072] In another embodiment of this specification, when the client sends a data operation request to the target data node, if the target data node crashes and cannot receive the data operation request, it will be unable to perform the corresponding data marking process. When the target data node restarts, the client will send the corresponding data operation request to the target data node again, so that the target data node can complete the data marking process corresponding to the data operation request.
[0073] In practical applications, when the storage state is non-persistent, the storage system client does not need to persist the trim operation, that is, it does not save the data operation request to the local machine. When a data node or client failover occurs, the index system, that is, the upper KV engine layer, returns trim failure information. The KV engine records the unsuccessful trim operation and only retry it afterward.
[0074] In one embodiment of this specification, when the client sends a data operation request to the target data node, if the target data node crashes and cannot receive the data operation request, it will be unable to perform the corresponding data marking process. The client will then send the data operation request back to the upper KV class engine layer, which will record the unsuccessful data operation request and resend the marking process instruction to the client at an appropriate time.
[0075] By employing the two failover methods described above, when a failed trim operation is encountered, the failed trim operation can be recorded and retried until the trim operation is completed, providing a reliable trim execution method and improving system reliability.
[0076] After the target data node completes the trim operation on invalid data, when a new data node needs to be written to the target data node, the target data node can write new data into the data range marked with trim, thereby achieving fast data writing and improving storage capacity. Specifically, the target data node is also used to receive a data write instruction for target data, and in response to the data write instruction, determine a data range to be used in the local storage space, delete the data in the data range to be used, and write the target data into the data range to be used, wherein the data range to be used belongs to the data range to be operated.
[0077] The data write instruction can be understood as an instruction to write the target data into the data block in the target data node. The data range to be used can be understood as a data control that has already undergone the trim operation. The data in the data range to be used is marked as invalid data, so the data node can directly identify the invalid data and overwrite the target data into the data range, thus completing the writing process of the target data.
[0078] In practical applications, since the TRIM operation only marks the data range to be operated on in the data node and does not delete the data in the data range to be operated on, when new data needs to be written, the data in the data range to be operated on can be deleted first, and then the target data can be written to the data range to be operated on.
[0079] In one embodiment of this specification, the target data node receives a data write instruction for the target data, determines the data range to be used in the local storage space according to the data write instruction, the data in the data range to be used is invalid data, deletes the invalid data in the data range to be used, and writes the target data into the data range to be used to complete the writing process of the target data.
[0080] In practice, when the target data writing range is larger than the data range to be operated on, it is necessary to ensure that the range adjacent to the data range to be operated on is also the data range to be operated on or a data range without data, so that the two data ranges form a continuous storage space for the target data to be written.
[0081] To ensure the correct operation of the system, the application storage system project system needs to guarantee that data that has already undergone a trim operation will not be read again. However, when the upper-layer key-value storage engine layer reads data that has undergone a trim operation, the client will return an undefined feedback result to the upper-layer key-value storage engine layer. Specifically, the client is also used to receive data read instructions for the data file, and send a data read request to the target data node according to the data read instructions; the target data node is also used to receive the data read request, generate a feedback result according to the data read request, and send it to the client.
[0082] The data read instruction can be understood as an instruction to read the data stored in the data node. After receiving the data read instruction for the data file, the client will send a data read request to the target data node. Similar to the marking instruction for the data file, the client will also convert the file-level data read instruction into a chunk-level data read request in the data node, so that the target data node can return the corresponding data read result according to the data read request.
[0083] In practical applications, since the data file has already undergone the trim operation, the feedback result returned by the target data node to the client is always undefined. However, there are three different situations: all data in the data file has successfully undergone the trim operation; all data in the data file has not successfully undergone the trim operation; and some data in the data file has successfully undergone the trim operation, while some data has not successfully undergone the trim operation.
[0084] When all trim operations are successfully executed, the target data node's feedback result is all zeros; when none trim operations are successfully executed, the target data node's feedback result is the original data before trimming; when some data successfully undergoes trim operations and some data fails to undergo trim operations, the target data node's feedback result is partly zeros and partly the original data before trimming. Specifically, the target data node is used to generate a feedback result indicating that the data is unreadable when the data operation request is successfully executed; when the data operation request fails, it uses the original data in the data range to be operated as the feedback result; when the data operation request partially fails, it uses the original data in the data range to be operated corresponding to the data operation request failure as the first feedback result, and generates a second feedback result based on the data in the data range to be operated corresponding to the data operation request success, and generates a feedback result based on the first feedback result and the second feedback result.
[0085] In practice, the feedback result of the unreadable state can be understood as data that is all 0. After the data has been trimmed, the trimmed data can no longer be read, and the target data node will report the unreadable state to the client. When the trim operation fails, the target data node will report the original data of the data file to the client. When some data is successfully trimmed and some data fails to trim, the target data node will report some unreadable data and the original data to the client.
[0086] In another embodiment of this specification, the client is configured to receive a marking operation instruction for a data file, store the marking operation information associated with the marking operation instruction in an operation log, update the data operation table associated with the data file according to the marking operation information recorded in the operation log when the operation log meets the log information processing conditions, generate a data operation request for the corresponding target data node based on the updated data operation table, and send the data operation request to the target data node.
[0087] Specifically, a data file is a file type in a computer system that can be used to store various types of data. In a storage system, a data file has a file identifier and consists of one or more data ranges, each containing multiple data blocks. Tagging operation instructions are computer instructions used to tag data within a data file. These instructions typically carry tagging operation information, including information about the data range to be tagged and the sequence number (ID) of the tagging operation information stored in the operation log. The operation log is used to record tagging operation information.
[0088] After storing the marking operation information associated with the marking operation instructions in the operation log, the marking operation information recorded in the operation log can be processed if the operation log meets the log information processing conditions. The log information processing conditions are pre-set conditions used to determine whether the marking operation information stored in the operation log can be processed. In this embodiment, the log information processing conditions include: determining that the operation log meets the log information processing conditions when the number of marking operation information stored in the operation log reaches a set threshold; and setting a preset log processing period, determining that the operation log meets the log information processing conditions when the preset log processing period is reached. The data operation table is used to store relevant information about the data to be processed corresponding to the data file, including but not limited to data range information, offset address information, length information, and processing status information.
[0089] Based on this, after updating the data operation table associated with the data file according to the marking operation information recorded in the operation log, a data operation request for the corresponding target data node is generated based on the updated data operation table, and the data operation request is sent to the target data node. After receiving the data operation request, the target data node performs marking operations on the data stored in the data node.
[0090] The storage system provided in this specification includes a client and data nodes. The client receives marking operation instructions for data files, parses the instructions to obtain marking operation information, generates a data operation request for a corresponding target data node based on the marking operation information, and sends the data operation request to the target data node. The target data node receives the data operation request and, based on the request, marks the data range in its local storage space associated with the data file. By converting the marking operation instructions for data files into data operation requests for data ranges of data nodes through the client, unnecessary data stored in the data nodes is marked. This eliminates the need to process invalid data during subsequent garbage collection, effectively reducing write amplification caused by garbage collection, reducing garbage data, and allowing new data to be written directly to the data range when writing to the data node, improving write efficiency, optimizing storage data nodes, and reducing storage costs.
[0091] Figure 3 A flowchart is shown of a data processing method for a client in a storage system according to an embodiment of this specification, including steps S302 to S306.
[0092] Step S302: Receive a marking operation instruction for the data file.
[0093] Step S304: Parse the marking operation instruction to obtain marking operation information.
[0094] Step S306: Generate a data operation request for the corresponding target data node based on the marked operation information, and send the data operation request to the target data node.
[0095] Optionally, the method includes:
[0096] Based on the marked operation information, the logical address information corresponding to the data file is determined, and a data operation request for the corresponding target data node is generated based on the logical address information.
[0097] Optionally, the method further includes:
[0098] If the operation feedback information corresponding to the data operation request is determined to be an operation failure, the storage status information of the marked operation instruction is queried, and the operation processing corresponding to the storage status information is executed.
[0099] Optionally, the method includes:
[0100] If the storage state information is in a persistent storage state, based on the persistent data operation request, the step of sending the data operation request to the target data node is executed;
[0101] When the storage state information is in a non-persistent storage state, the processing result information corresponding to the marking operation instruction is generated and sent to the index system corresponding to the storage system, wherein the index system is used to resend the marking operation instruction to the storage system.
[0102] Optionally, the method further includes:
[0103] Receive a data read instruction for the data file, and send a data read request to the target data node according to the data read instruction.
[0104] Optionally, the method includes:
[0105] Receive a marking operation instruction for a data file, and store the marking operation information associated with the marking operation instruction in the operation log. If the operation log meets the log information processing conditions, update the data operation table associated with the data file according to the marking operation information recorded in the operation log.
[0106] Based on the updated data operation table, a data operation request is generated for the corresponding target data node, and the data operation request is sent to the target data node.
[0107] Optionally, the method includes:
[0108] The storage system is a distributed file system, and the client includes a tagging operation interface;
[0109] The client is used to call the marking operation interface to parse the marking operation instruction and obtain the marking operation information, wherein the marking operation information includes file identification information, file offset information, and file marking length information.
[0110] The data processing method provided in this embodiment is similar to the execution process of the target data node in the storage system provided in the above embodiment. The same or corresponding descriptions can be found in the above embodiments, and will not be repeated here.
[0111] This specification provides a data processing method for a client in a storage system, comprising receiving a marking operation instruction for a data file; parsing the marking operation instruction to obtain marking operation information; generating a data operation request for a corresponding target data node based on the marking operation information; and sending the data operation request to the target data node. By converting the marking operation instruction for the data file into a data operation request for a data range of a data node through the client, unnecessary data stored in the data node is marked. This eliminates the need to process invalid data during subsequent garbage collection, effectively reducing write amplification caused by garbage collection and minimizing garbage data.
[0112] Figure 4 A flowchart is shown of a data processing method for a target data node in a storage system according to an embodiment of this specification, including steps S402 to S404.
[0113] Step S402: Receive a request for data operation on the data file.
[0114] Step S404: According to the data operation request, mark the data range to be operated on in the local storage space associated with the data file.
[0115] Optionally, the method includes:
[0116] Based on the logical address information carried in the data operation request, the operation description information is determined. Based on the operation description information, the data range to be operated associated with the data file is determined in the local storage space. The attribute information of the data range to be operated is modified and saved as a marking process for the data range to be operated.
[0117] Optionally, the method further includes:
[0118] The system receives a data write instruction for target data, and in response to the data write instruction, determines a data range to be used in the local storage space, deletes the data in the data range to be used, and writes the target data into the data range to be used, wherein the data range to be used belongs to the data range to be operated.
[0119] Optionally, the method further includes:
[0120] Receive a data read instruction for the data file, and send a data read request to the target data node according to the data read instruction;
[0121] The target data node is also used to receive the data read request, generate a feedback result based on the data read request, and send it to the client.
[0122] Optionally, the method further includes:
[0123] If the data operation request is executed successfully, a feedback result indicating that the data is in an unreadable state is generated;
[0124] If the data operation request fails, the original data in the data range to be operated on will be used as the feedback result.
[0125] If the data operation request fails to execute, the original data in the data range to be operated that corresponds to the failed data operation request is used as the first feedback result, and a second feedback result is generated based on the data in the data range to be operated that corresponds to the successful data operation request. A feedback result is generated based on the first feedback result and the second feedback result.
[0126] The data processing method provided in this embodiment is similar to the execution process of the target data node in the storage system provided in the above embodiment. The same or corresponding descriptions can be found in the above embodiments, and will not be repeated here.
[0127] This specification provides a data processing method for a target data node in a storage system, comprising: receiving a data operation request for a data file; and marking a data range in the local storage space associated with the data file to be operated on according to the data operation request. By marking the data range in the local storage space associated with the data file according to the data operation request, invalid data does not need to be processed again during subsequent garbage collection, effectively reducing write amplification caused by garbage collection, reducing garbage data, and allowing new data to be written directly to the data range to be operated on, improving write efficiency, optimizing the storage data node, and reducing storage costs.
[0128] Figure 5 This specification illustrates an interaction diagram between a client and a target data node in a storage system according to an embodiment of the present specification, with specific steps including steps S502 to S526.
[0129] Step S502: The client receives a marking operation instruction for the data file.
[0130] The client receives an instruction to mark the data file as the user's personal information file. The marking instruction is an instruction to perform a trim operation on the user's personal information file.
[0131] Step S504: The client parses the marking operation instruction to obtain the marking operation information.
[0132] After the client parses the tag operation instructions, the tag operation information obtained is fileId, fileOffset, and Length.
[0133] Step S506: The client determines the logical address information corresponding to the data file based on the tag operation information.
[0134] After the client obtains the marking operation instructions for the data file from the metadata management section, it can determine the logical address information of the data file based on the file-level operation information fileId, fileOffset, and Length carried in the marking operation information. The logical address information is the data block-level operation information chunkId, chunkOffset, and Length.
[0135] Step S508: The client generates a data operation request for the corresponding target data node based on the logical address information, and sends the data operation request to the target data node.
[0136] The client generates a data operation request based on the parsed logical address information. The data operation request is a request to operate on the data on the data block in the data node. The data operation request is sent to the target data node, so that the target data node can further map the physical address information of the data file on the hard disk based on the logical address information, and determine the data space to be operated in the storage space, so as to process the data in the data space to be operated.
[0137] Step S510: The target data node receives a data operation request for the data file.
[0138] Step S512: The target data node determines the operation description information based on the logical address information carried in the data operation request, and determines the data range to be operated associated with the data file in the local storage space based on the operation description information.
[0139] The target data node determines the operation description information based on the logical address information carried in the data operation request, and determines the data range to be operated based on the operation description information. The operation description information can be understood as physical address information, and the data range to be operated is the storage space in the data block of the data node.
[0140] Step S514: The target data node modifies and saves the attribute information of the data range to be operated on, as a marking process for the data range to be operated on.
[0141] The target data node modifies the attribute information of the data range to be operated on and saves the modification record. The attribute information of the data range to be operated on can be the header data of the data block, such as the data block's capacity information and the data block's marking information (whether it has been processed by TRIM).
[0142] Step S516: If the client determines that the operation feedback information corresponding to the data operation request is an operation failure, it queries the storage status information of the marked operation instruction and executes the operation processing corresponding to the storage status information.
[0143] If the client determines that the trim operation has failed, it queries the storage status information, which is divided into persistent storage status and non-persistent storage status, and performs the corresponding operation processing.
[0144] Step S518: The client receives a data read instruction for the data file.
[0145] After receiving a data read instruction to read a data file, the client will send a corresponding data read request to the target data node. The data read instruction can be understood as an instruction to read a data file. Since the data file has been trimmed before, the client reads the trimmed data. However, in practical applications, the data read instruction can also be an instruction to read any type of data file, such as reading untrimmed data.
[0146] Step S520: The client sends a data read request to the target data node according to the data read instruction.
[0147] Step S522: The target data node receives the data read request.
[0148] Step S524: The target data node generates a feedback result based on the data read request and sends it to the client.
[0149] After receiving a data read request, the target data node generates a feedback result based on the status of the data file trimming. Specifically, when the user's personal information is successfully trimmed, the feedback result includes all zeros; when the trimming of the user's personal information fails, the feedback result includes the user's personal information; when the trimming of part of the user's personal information is successful and part of the trimming fails, the feedback result includes some all zeros and some user personal information.
[0150] Step 526: The target data node receives a data write instruction for the target data, and in response to the data write instruction, determines a data range to be used in the local storage space, deletes the data in the data range to be used, and writes the target data into the data range to be used, wherein the data range to be used belongs to the data range to be operated.
[0151] When a target data node receives a data write instruction for the target data, it can reuse the data range to be operated on. The target data can be the updated data corresponding to the data file. When writing this type of data, the data range corresponding to the original deleted data can be selected for writing. The target data can also be other data files. When new data needs to be written to the data node, it can also be written to the data range to be operated on.
[0152] In summary, the storage system provided in this specification converts marking operation instructions for data files into data operation requests for data ranges of data nodes through the client. Unnecessary data stored in data nodes is marked, eliminating the need to process invalid data during subsequent garbage collection. This effectively reduces write amplification caused by garbage collection and minimizes garbage data. Furthermore, it allows new data to be written directly to the data range to be operated on when writing new data to data nodes, improving write efficiency, optimizing data node storage, and reducing storage costs.
[0153] Corresponding to the above method embodiments, this specification also provides data processing apparatus embodiments. Figure 6 This specification illustrates a schematic diagram of a data processing apparatus for a client application in a storage system, according to an embodiment of this specification. For example... Figure 6 As shown, the device includes:
[0154] The receiving module 602 is configured to receive marking operation instructions for data files;
[0155] The parsing module 604 is configured to parse the marking operation instruction to obtain marking operation information;
[0156] The sending module 606 is configured to generate a data operation request for the corresponding target data node based on the marked operation information, and send the data operation request to the target data node.
[0157] Optionally, the parsing module 604 is further configured to:
[0158] Based on the marked operation information, the logical address information corresponding to the data file is determined, and a data operation request for the corresponding target data node is generated based on the logical address information.
[0159] Optionally, the storage system is a distributed file system, the client includes a tag operation interface, and the parsing module 604 is further configured to:
[0160] The marking operation interface is invoked to parse the marking operation instruction and obtain the marking operation information, wherein the marking operation information includes file identification information, file offset information, and file marking length information.
[0161] Optionally, the device further includes a query module configured to:
[0162] If the operation feedback information corresponding to the data operation request is determined to be an operation failure, the storage status information of the marked operation instruction is queried, and the operation processing corresponding to the storage status information is executed.
[0163] Optionally, the query module is further configured as follows:
[0164] If the storage state information is in a persistent storage state, based on the persistent data operation request, the step of sending the data operation request to the target data node is executed;
[0165] When the storage state information is in a non-persistent storage state, the processing result information corresponding to the marking operation instruction is generated and sent to the index system corresponding to the storage system, wherein the index system is used to resend the marking operation instruction to the storage system.
[0166] Optionally, the device further includes a reading module configured to:
[0167] Receive a data read instruction for the data file, and send a data read request to the target data node according to the data read instruction.
[0168] Optionally, the device further includes a storage module configured to:
[0169] The system receives a marking operation instruction for a data file and stores the marking operation information associated with the instruction in an operation log. If the operation log meets the log information processing conditions, the system updates the data operation table associated with the data file based on the marking operation information recorded in the operation log. Based on the updated data operation table, the system generates a data operation request for the corresponding target data node and sends the data operation request to the target data node.
[0170] The data processing device provided in this manual converts the marking operation instructions for data files into data operation requests for data ranges of data nodes through the client. It marks the data that is not needed in the items stored in the data nodes, so that invalid data does not need to be processed again during subsequent garbage collection, effectively reducing write amplification caused by garbage collection and reducing garbage data.
[0171] The above is an illustrative scheme of the data processing apparatus of this embodiment. It should be noted that the technical solution of this data processing apparatus and the technical solution of the data processing method described above belong to the same concept. For details not described in detail in the technical solution of the data processing apparatus, please refer to the description of the technical solution of the data processing method described above.
[0172] Corresponding to the above method embodiments, this specification also provides data processing apparatus embodiments. Figure 7 This specification illustrates a schematic diagram of a data processing apparatus for a target data node in a storage system, according to an embodiment of this specification. Figure 7 As shown, the device includes:
[0173] The receiving module 702 is configured to receive requests for data operations on a data file;
[0174] The marking module 704 is configured to mark the data range to be operated on in the local storage space that is associated with the data file according to the data operation request.
[0175] Optionally, the marking module 704 is further configured to:
[0176] Based on the operation description information carried in the data operation request, the data range to be operated associated with the data file is determined in the local storage space, and the attribute information of the data range to be operated is modified and saved as a marking process for the data range to be operated.
[0177] Optionally, the device further includes a writing module configured to:
[0178] The system receives a data write instruction for target data, and in response to the data write instruction, determines a data range to be used in the local storage space, deletes the data in the data range to be used, and writes the target data into the data range to be used, wherein the data range to be used belongs to the data range to be operated.
[0179] Optionally, the device further includes a feedback module configured to:
[0180] The system receives the data read request, generates a feedback result based on the data read request, and sends it to the client.
[0181] Optionally, the device further includes a feedback module configured to:
[0182] If the data operation request is executed successfully, a feedback result indicating that the data is in an unreadable state is generated;
[0183] If the data operation request fails, the original data in the data range to be operated on will be used as the feedback result.
[0184] If the data operation request fails to execute, the original data in the data range to be operated that corresponds to the failed data operation request is used as the first feedback result, and a second feedback result is generated based on the data in the data range to be operated that corresponds to the successful data operation request. A feedback result is generated based on the first feedback result and the second feedback result.
[0185] The data processing device provided in this specification marks the data range to be operated on in the local storage space associated with the data file according to the data operation request through the data node. This eliminates the need to process invalid data during subsequent garbage collection, effectively reducing write amplification caused by garbage collection and reducing garbage data. It also allows new data to be written directly to the data range to be operated on when writing new data to the data node, improving write efficiency, optimizing storage data nodes, and reducing storage costs.
[0186] The above is an illustrative scheme of the data processing apparatus of this embodiment. It should be noted that the technical solution of this data processing apparatus and the technical solution of the data processing method described above belong to the same concept. For details not described in detail in the technical solution of the data processing apparatus, please refer to the description of the technical solution of the data processing method described above.
[0187] Figure 8A structural block diagram of a computing device 800 according to an embodiment of this specification is shown. The components of the computing device 800 include, but are not limited to, a memory 810 and a processor 820. The processor 820 is connected to the memory 810 via a bus 830, and a database 850 is used to store data.
[0188] The computing device 800 also includes an access device 840, which enables the computing device 800 to communicate via one or more networks 860. Examples of these networks include a Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the Internet. The access device 840 may include one or more of any type of wired or wireless network interface (e.g., a Network Interface Card (NIC)), such as an IEEE 802.11 Wireless Local Area Network (WLAN) interface, a Wi-MAX interface, an Ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a Bluetooth interface, a Near Field Communication (NFC) interface, and so on.
[0189] In one embodiment of this specification, the above-described components of the computing device 800 and Figure 8 Other components, not shown, can also be connected to each other, for example, via a bus. It should be understood that... Figure 8 The block diagram of the computing device shown is for illustrative purposes only and is not intended to limit the scope of this specification. Those skilled in the art can add or replace other components as needed.
[0190] The computing device 800 can be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile phones (e.g., smartphones), wearable computing devices (e.g., smartwatches, smart glasses, etc.) or other types of mobile devices, or stationary computing devices such as desktop computers or PCs. The computing device 800 can also be a mobile or stationary server.
[0191] The processor 820 implements the data processing method when executing the computer instructions.
[0192] The above is an illustrative scheme of a computing device according to this embodiment. It should be noted that the technical solution of this computing device and the technical solution of the data processing method described above belong to the same concept. For details not described in detail in the technical solution of the computing device, please refer to the description of the technical solution of the data processing method described above.
[0193] An embodiment of this specification also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of the data processing method as described above.
[0194] The above is an illustrative scheme of a computer-readable storage medium according to this embodiment. It should be noted that the technical solution of this storage medium and the technical solution of the data processing method described above belong to the same concept. For details not described in detail in the technical solution of the storage medium, please refer to the description of the technical solution of the data processing method described above.
[0195] An embodiment of this specification also provides a computer program, wherein when the computer program is executed in a computer, it causes the computer to perform the steps of the above-described data processing method.
[0196] The above is an illustrative example of a computer program according to this embodiment. It should be noted that the technical solution of this computer program and the technical solution of the data processing method described above belong to the same concept. Details not described in detail in the technical solution of the computer program can be found in the description of the technical solution of the data processing method described above.
[0197] The foregoing has described specific embodiments of this specification. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims may be performed in a different order than that shown in the embodiments and may still achieve the desired result. Furthermore, the processes depicted in the drawings do not necessarily require the specific or sequential order shown to achieve the desired result. In some embodiments, multitasking and parallel processing are possible or may be advantageous.
[0198] The computer instructions include computer program code, which may be in the form of source code, object code, executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording media, USB flash drive, portable hard drive, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electrical carrier signals, telecommunication signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium may be appropriately added to or subtracted according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media may not include electrical carrier signals and telecommunication signals.
[0199] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that the embodiments in this specification are not limited to the described order of actions, because according to the embodiments in this specification, some steps can be performed in other orders or simultaneously. Furthermore, those skilled in the art should also understand that the embodiments described in this specification are all preferred embodiments, and the actions and modules involved are not necessarily essential to the embodiments in this specification.
[0200] In the above embodiments, the descriptions of each embodiment have different focuses. For parts not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
[0201] The preferred embodiments disclosed above are merely illustrative of this specification. Optional embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the embodiments described herein. These embodiments are selected and specifically described in this specification to better explain the principles and practical applications of the embodiments, thereby enabling those skilled in the art to better understand and utilize this specification. This specification is limited only by the claims and their full scope and equivalents.
Claims
1. A storage system, the system comprising a client and a data node; The client is configured to receive a marking operation instruction for a data file, parse the marking operation instruction to obtain marking operation information, generate a data operation request for the corresponding target data node based on the marking operation information, and send the data operation request to the target data node. The target data node is used to receive the data operation request and, according to the data operation request, mark the data range to be operated on in the local storage space associated with the data file. The client is configured to, when determining that the operation feedback information corresponding to the data operation request is an operation failure, query the storage status information of the marked operation instruction and execute the operation processing corresponding to the storage status information. The operation processing includes sending the data operation request to the target data node, or sending the processing result information corresponding to the marked operation instruction to the index system corresponding to the storage system. The index system is used to resend the marked operation instruction to the storage system.
2. The system as described in claim 1, wherein the client is configured to determine the logical address information corresponding to the data file based on the marked operation information, and generate a data operation request for the corresponding target data node based on the logical address information.
3. The system as described in claim 1, wherein the target data node is configured to determine operation description information based on the logical address information carried in the data operation request, determine the data range to be operated associated with the data file in the local storage space based on the operation description information, modify and save the attribute information of the data range to be operated as a marking process for the data range to be operated.
4. The system as described in claim 1, wherein the client is configured to, when the storage state information is in a persistent storage state, execute the step of sending the data operation request to the target data node based on the persistent data operation request; In the case that the storage state information is a non-persistent storage state, a processing result information corresponding to the marking operation instruction is generated and sent to an index system corresponding to the storage system, wherein The indexing system is used to resend the tagging operation instruction to the storage system.
5. The system as described in claim 1, wherein the target data node is further configured to receive a data write instruction for the target data, and in response to the data write instruction, determine a data range to be used in the local storage space, delete the data in the data range to be used, and write the target data into the data range to be used.
6. The system as described in claim 1, wherein the client is further configured to receive a data read instruction for the data file, and send a data read request to the target data node according to the data read instruction; The target data node is also used to receive the data read request, generate a feedback result based on the data read request, and send it to the client.
7. The system as described in claim 6, wherein the target data node is configured to generate a feedback result indicating that the data is in an unreadable state if the data operation request is successfully executed; If the data operation request fails, the original data in the data range to be operated on will be used as the feedback result. If the data operation request fails to execute, the original data in the data range to be operated that corresponds to the failed data operation request is used as the first feedback result, and a second feedback result is generated based on the data in the data range to be operated that corresponds to the successful data operation request. A feedback result is generated based on the first feedback result and the second feedback result.
8. The system as described in claim 1, wherein the client is configured to receive a marking operation instruction for a data file, store the marking operation information associated with the marking operation instruction in an operation log, and update the data operation table associated with the data file according to the marking operation information recorded in the operation log when the operation log meets the log information processing conditions; Based on the updated data operation table, a data operation request is generated for the corresponding target data node, and the data operation request is sent to the target data node.
9. The system according to any one of claims 1 to 8, wherein the storage system is a distributed file system, and the client includes a tag operation interface; The client is configured to invoke the marking operation interface to parse the marking operation instruction and obtain the marking operation information, wherein The marking operation information includes file identification information, file offset information, and file mark length information.
10. A data processing method applied to a client in a storage system, comprising: Receive marking operation instructions for data files; Parse the marking operation instructions to obtain marking operation information; Generate a data operation request for the corresponding target data node based on the marked operation information, and send the data operation request to the target data node; If the operation feedback information corresponding to the data operation request is determined to be an operation failure, the storage status information of the marked operation instruction is queried, and the operation processing corresponding to the storage status information is executed. The operation processing is to send the data operation request to the target data node, or to send the processing result information corresponding to the marked operation instruction to the index system corresponding to the storage system.
11. A data processing method applied to a target data node in a storage system, the target data node interacting with the client of claim 10, comprising: Receive requests for data operations on data files; Based on the data operation request, the data range to be operated on in the local storage space associated with the data file is marked.
12. A computing device comprising a memory, a processor, and computer instructions stored in the memory and executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of claim 10 or 11.
13. A computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the method of claim 10 or 11.