Data processing method, apparatus, device, medium, and product

By logically dividing the data cluster into multiple logical sub-clusters and performing additional preset operations during writing, the I/O performance degradation caused by COW operations under the QCOW2 format is resolved, thus improving I/O performance without modifying the disk format.

CN122240033APending Publication Date: 2026-06-19NEW H3C BIG DATA TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NEW H3C BIG DATA TECH CO LTD
Filing Date
2026-04-29
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

The existing QCOW2 format copy-on-write (COW) operation leads to a decrease in virtual machine I/O performance in high-concurrency scenarios, especially in high-concurrency write scenarios. Frequent COW operations will cause increased virtual machine response latency and I/O performance bottlenecks.

Method used

The data cluster is logically divided into multiple smaller logical subclusters, and when writing target data, additional preset write operations are performed on the first and/or last logical subclusters to ensure the accuracy of subsequent read operations, such as zeroing or data copying.

Benefits of technology

By logically dividing the data into clusters, the amount of data written is reduced, the I/O load is lowered, and the I/O performance is improved, all without requiring modification of the disk format, thus achieving performance optimization.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122240033A_ABST
    Figure CN122240033A_ABST
Patent Text Reader

Abstract

This application relates to the field of data processing technology and discloses a data processing method, apparatus, device, medium, and product. The method includes: obtaining a write request, including: target data to be written and write address information; determining a target data cluster for writing the target data based on the write address information, and determining a starting logical sub-cluster and an ending logical sub-cluster; the target data cluster includes multiple logical sub-clusters, the starting logical sub-cluster being the logical sub-cluster corresponding to the starting position for writing the target data, and the ending logical sub-cluster being the logical sub-cluster corresponding to the ending position for writing the target data; performing a preset write operation on the target logical sub-cluster; the target logical sub-cluster includes logical sub-clusters in the starting and ending logical sub-clusters that have not yet been written with data; and writing the target data to the target data cluster. This application requires only the amount of data from a maximum of two additional logical sub-clusters to be written when writing the target data, which can significantly reduce the amount of data processing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, specifically to data processing methods, apparatus, equipment, media, and products. Background Technology

[0002] With the widespread application of cloud computing and virtualization technologies, virtual machine snapshots have become an important means of ensuring business continuity and achieving fault recovery. Currently, the QCOW2 format is widely used as a virtual disk image, and one of its core mechanisms is copy-on-write (COW). Current COW operations can easily impact I / O (Input / Output) performance, especially in high-concurrency scenarios, leading to increased virtual machine latency. Summary of the Invention

[0003] In view of this, this application provides a data processing method, apparatus, device, medium, and product to solve the problem of poor COW operation performance.

[0004] In a first aspect, this application provides a data processing method, the method comprising: Obtain a write request; the write request includes: the target data to be written and write address information; The target data cluster for writing the target data is determined based on the write address information, and the start logical sub-cluster and the end logical sub-cluster are determined; the target data cluster includes multiple logical sub-clusters, the start logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the start position for writing the target data, and the end logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the end position for writing the target data; Perform a preset write operation on the target logical sub-cluster; the target logical sub-cluster includes the logical sub-cluster in the starting logical sub-cluster and the logical sub-cluster in the ending logical sub-cluster in which no data has been written. Write the target data into the target data cluster.

[0005] Secondly, this application provides a data processing apparatus, the apparatus comprising: The acquisition module is used to acquire write requests; the write request includes: target data to be written and write address information; The determining module is used to determine the target data cluster for writing the target data based on the write address information, and to determine the starting logical sub-cluster and the ending logical sub-cluster; the target data cluster includes multiple logical sub-clusters, the starting logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the starting position for writing the target data, and the ending logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the ending position for writing the target data; The processing module is used to perform a preset write operation on the target logical sub-cluster; the target logical sub-cluster includes the logical sub-cluster in the starting logical sub-cluster and the logical sub-cluster in the ending logical sub-cluster in which no data has been written. An operation module is used to write the target data into the target data cluster.

[0006] Thirdly, this application provides an electronic device, including: a memory and a processor, which are communicatively connected to each other. The memory stores computer instructions, and the processor executes the computer instructions to perform the data processing method described in the first aspect or any corresponding embodiment.

[0007] Fourthly, this application provides a computer-readable storage medium storing computer instructions for causing a computer to perform the data processing method described in the first aspect or any corresponding embodiment.

[0008] Fifthly, this application provides a computer program product, including computer instructions for causing a computer to execute the data processing method described in the first aspect or any corresponding embodiment thereof.

[0009] The data processing method provided in this application logically divides a data cluster into multiple logical subclusters. When target data needs to be written, the starting and ending logical subclusters can be determined based on the start and end positions of the target data being written. Additional pre-defined write operations are performed on the starting and / or ending logical subclusters for the first data write, ensuring the accuracy of subsequent read operations and other processes. By dividing the data into multiple smaller logical subclusters, at most only two additional logical subclusters of data need to be written when writing target data, significantly reducing data processing volume, I / O load, and improving I / O performance. Furthermore, this method only logically divides the data clusters and does not require modification of the disk format itself, thus achieving performance optimization without altering the disk format. Attached Figure Description

[0010] To more clearly illustrate the technical solutions in the specific embodiments of this application or the prior art, the drawings used in the description of the specific embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained from these drawings without creative effort.

[0011] Figure 1 This is a schematic diagram illustrating an application scenario according to an embodiment of this application; Figure 2 This is a schematic flowchart of a first type of data processing method according to an embodiment of this application; Figure 3 This is a schematic diagram of accessing a data cluster based on a second-level table address mapping according to an embodiment of this application; Figure 4 This is a schematic diagram of a second type of data processing method according to an embodiment of this application; Figure 5 This is a schematic diagram illustrating the logical partitioning and space management of data clusters according to an embodiment of this application; Figure 6 This is a schematic diagram of a logical sub-cluster partition according to an embodiment of this application; Figure 7 This is a schematic diagram of a third data processing method according to an embodiment of this application; Figure 8 This is a structural block diagram of a data processing apparatus according to an embodiment of this application; Figure 9 This is a schematic diagram of the hardware structure of an electronic device according to an embodiment of this application. Detailed Implementation

[0012] To make the objectives, technical solutions, and advantages of the embodiments of this application clearer, the technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0013] It is understood that before using the technical solutions disclosed in the various embodiments of this application, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in this application in an appropriate manner in accordance with relevant laws and regulations, and user authorization should be obtained.

[0014] The terms "first" and "second" are used for descriptive purposes only and should not be construed as indicating or implying relative importance or implicitly specifying the number of technical features indicated. Therefore, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of this application, "multiple" means two or more, unless otherwise explicitly specified.

[0015] Before providing a detailed description of the embodiments of this application, some of the terms and concepts involved in the embodiments of this application will be explained. These explanations are intended to make the embodiments of this application easier to understand and should not be considered as limiting the scope of protection claimed in this application.

[0016] (1) Copy-on-Write (COW), a core technology of virtual machine snapshots, copies the original data blocks when a write operation occurs.

[0017] (2) QCOW2 (QEMU Copy-on-Write Version 2) is a commonly used virtual disk format on the KVM (Kernel-based Virtual Machine) platform, which supports features such as snapshots, compression, and encryption.

[0018] (3) Data Cluster, the smallest allocation unit in QCOW2, with a default size of 64KB or larger.

[0019] (4) Guest Offset Address: The logical disk address used inside the virtual machine.

[0020] (5) Host Offset Address: The physical storage location of the virtual machine logical address (i.e., the guest offset address) after being mapped by QCOW2, representing the actual physical address of the data cluster.

[0021] (6) L1 table (Level 1 mapping table), the top-level address mapping table in the QCOW2 image, where each element (L1 table entry) stores the host offset address (HOA_L2) of an L2 table.

[0022] (7) The L2 table (L2 table, secondary mapping table) is a secondary address mapping table pointed to by the L1 table entries. Each element (L2 table entry) stores the host offset address of the corresponding data cluster, or marks the status of the data cluster (such as whether data is allocated, compressed, or encrypted).

[0023] After creating a snapshot of the QCOW2 image file, when a virtual machine initiates a write operation on an existing data block, the system needs to first copy the original content of the data block to a new location (i.e., the COW operation) before allowing new data to be written, in order to ensure the consistency of the snapshot data.

[0024] However, the basic unit of a Copy-on-Write (COW) operation is the entire "data cluster," which is typically 64KB or even larger. Even if only a few bytes are modified, the entire data cluster must be copied, easily causing a large amount of unnecessary I / O overhead. Especially in high-concurrency write scenarios (such as database transaction log flushing, operating system updates, etc.), frequent COW operations can severely slow down I / O performance, leading to increased virtual machine response latency and a degraded user experience.

[0025] If the standard QCOW2 format is used to implement basic snapshot functionality, and each data cluster has a fixed size (e.g., 64KB), after a snapshot is created, the first write operation to the data cluster will trigger the COW process. The COW mechanism is as follows: for writing to any offset address, if the data cluster already contains data, the entire data cluster must be copied to the new location; only after the copy is completed can new data be written to the target cluster.

[0026] Figure 1 A schematic diagram of a Copy-on-Write (COW) operation is shown. If the virtual machine needs to write 4KB of data to a logical block address (LBA), the corresponding data cluster is determined to be data cluster A, and the size of the data cluster is 64KB. If data cluster A already contains data, a COW operation can be triggered, and a new data cluster B can be allocated. Figure 1 As shown, at this point, 60KB of data in data cluster A will be copied to data cluster B, and the aforementioned 4KB of data will be written into data cluster B; and the L2 table will be updated to point to data cluster B.

[0027] Therefore, when performing a write operation, even if only a few bytes are modified, a large portion of the data cluster (such as the 60KB of data mentioned above) must be copied. This redundant copying method results in excessive overhead for COW operations, wasting bandwidth and time. In high-concurrency write scenarios, COW operations become a performance bottleneck, with RPS (requests per second) dropping by more than 70% and significant I / O latency.

[0028] In addition, in non-snapshot scenarios, newly allocated data clusters also need to be zeroed out, which will also affect I / O efficiency.

[0029] This application provides a data processing method that divides data clusters into multiple smaller logical sub-clusters. This reduces the amount of data processing required when writing target data, necessitating the writing of at most two additional logical sub-clusters. This significantly reduces I / O load and improves I / O performance. Furthermore, this method only logically divides the data clusters and does not require modification of the disk format itself, thus achieving performance optimization without altering the disk format.

[0030] According to an embodiment of this application, a data processing method embodiment is provided. It should be noted that the steps shown in the flowchart in the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. Furthermore, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in a different order than that shown here.

[0031] This embodiment provides a data processing method that can be applied to devices such as servers and host machines that can provide virtual machine functionality. Figure 2 This is a flowchart of a data processing method according to an embodiment of this application, such as... Figure 2 As shown, the process includes the following steps.

[0032] Step S201: Obtain a write request; the write request includes: the target data to be written and the write address information.

[0033] When data needs to be written to the disk, a corresponding write request can be initiated. Specifically, this write request can be a write operation request initiated by the virtual machine; the virtual machine can initiate the corresponding write request whenever data needs to be written.

[0034] The write request includes the data to be written, which will be referred to as the "target data" for ease of description. Furthermore, the write request also includes the location and region on the disk where the target data should be written, i.e., it includes write address information.

[0035] In some instances, the write address information includes the write start address (or the entry address for writing) and the data size, which is the length of the target data. Based on this write start address and data size, the target data can be written to the target storage area on the disk. The starting address of the target storage area is the write start address, and the size of the target storage area is the data size. Specifically, for write requests initiated by a virtual machine, the write start address is the guest offset address, which is a type of Logical Block Address (LBA). Accordingly, the write address information specifically includes: the guest offset address (LBA) and the data size (len).

[0036] For example, the address information to be written includes: LBA=0x1024, len=4096, which means that 4KB of target data needs to be written at this time.

[0037] Step S202: Determine the target data cluster for writing target data based on the write address information, and determine the starting logical sub-cluster and the ending logical sub-cluster. The target data cluster includes multiple logical sub-clusters. The starting logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the starting position for writing target data, and the ending logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the ending position for writing target data.

[0038] After obtaining the write address information, it can be determined which data cluster(s) the target data needs to be written to. For ease of description, this data cluster is referred to as the "target data cluster". It can be understood that there can be one or more target data clusters. Specifically, this data cluster can be a data cluster of a QCOW2 image file.

[0039] In a virtual machine scenario, since the write address information generally includes the guest offset address, it needs to be converted into the actual physical address, i.e., the host offset address. Then, the target location for writing the target data can be located based on the host offset address, and the data cluster corresponding to the target location is the target data cluster.

[0040] Figure 3 This illustrates a schematic diagram of accessing a data cluster based on a second-level table address mapping, such as... Figure 3 As shown, by querying the L1 table based on the client's offset address, the corresponding L1 table entry can be determined, which in turn locates the L2 table corresponding to that L1 table entry. Finally, by querying the L2 table, the corresponding L2 table entry is determined. This L2 table entry records the offset address of the data cluster, and this data cluster is the target data cluster. Furthermore, the intra-cluster offset can be determined based on the client's offset address. By adding this intra-cluster offset to the offset address of the target data cluster, the host offset address corresponding to that client's offset address can be determined.

[0041] This embodiment introduces the concept of "logical subclusters." For each physical data cluster, it is logically divided into multiple subclusters, i.e., multiple logical subclusters. The size of each logical subcluster can be different. To facilitate accurate positioning of logical subclusters, data clusters are generally divided into multiple logical subclusters of equal length. To ensure performance, the number of logical subclusters needs to be greater than a preset number, such as 4 or 8. Furthermore, the number of logical subclusters (or the size of the logical subclusters) is configurable and supports dynamic adjustment to adapt to different scenarios.

[0042] The size of the logical sub-clusters can be 1KB, 2KB, 4KB, 8KB, etc. For example, if the data cluster is 64KB in size, and it is divided into 32 logical sub-clusters of equal length, then the size of each logical sub-cluster is 2KB; if the data cluster is divided into 16 logical sub-clusters of equal length, then the size of each logical sub-cluster is 4KB.

[0043] Based on the write address information, the starting position and ending position for writing target data within the target data cluster can be determined. It can be understood that the length between the starting and ending positions is consistent with the size of the target data. Furthermore, the logical sub-cluster corresponding to the starting position can be determined and designated as the "starting logical sub-cluster"; similarly, the logical sub-cluster corresponding to the ending position can be determined and designated as the "ending logical sub-cluster".

[0044] Step S203: Perform a preset write operation on the target logical sub-cluster; the target logical sub-cluster includes the logical sub-cluster in the starting logical sub-cluster and the logical sub-cluster in the ending logical sub-cluster in which no data has been written.

[0045] In this embodiment, at least one of the starting logical subcluster and the ending logical subcluster may be the first time data is written to it, meaning that no data has been written to it before. For ease of description, the logical subcluster that is being written to for the first time is referred to as the target logical subcluster. It can be understood that if only the starting logical subcluster is the first time data is written to it, then the target logical subcluster only includes the starting logical subcluster; if only the ending logical subcluster is the first time data is written to it, then the target logical subcluster only includes the ending logical subcluster; if both the starting and ending logical subclusters are the first time data is written to them, then the target logical subcluster includes both the starting and ending logical subclusters.

[0046] For the first write operation to the target logical subcluster, to avoid affecting subsequent data processing (such as read operations initiated by the virtual machine), a preset write operation needs to be performed on the target logical subcluster. This operation writes the correct data into the target logical subcluster to ensure the accuracy of subsequent data processing. This preset write operation will be explained in detail later.

[0047] If the starting logical subcluster or the ending logical subcluster is not being written to for the first time, meaning that data has already been written to the starting logical subcluster or the ending logical subcluster before, then no additional pre-written operation is required at this time.

[0048] Step S204: Write the target data to the target data cluster.

[0049] Furthermore, the target data needs to be written to the target data cluster to complete the write operation. This means that, based on the write address information, the target data needs to be written to the corresponding location within the target data cluster, specifically to the area between the aforementioned start and end positions.

[0050] Thus, after the virtual machine initiates a write request, in addition to writing the target data normally, at most only two logical subclusters need to be written using additional pre-defined write operations. Since the size of the logical subcluster is much smaller than the size of the data cluster, the amount of data required for the additional write operations can be significantly reduced. For example, if the size of the logical subcluster is 2KB, then at most only 4KB of data needs to be written additionally.

[0051] The data processing method provided in this embodiment logically divides a data cluster into multiple logical subclusters. When target data needs to be written, the starting and ending logical subclusters can be determined based on the start and end positions of the target data. Additional preset write operations are performed on the starting and / or ending logical subclusters for the first data write to ensure the accuracy of subsequent read operations. By dividing the data into multiple smaller logical subclusters, at most two additional logical subclusters of data need to be written when writing target data, significantly reducing data processing volume, I / O load, and improving I / O performance. Furthermore, this method only logically divides the data clusters and does not require modification of the disk format itself, thus achieving performance optimization without altering the disk format.

[0052] This embodiment provides a data processing method that can be applied to devices such as servers and host machines that can provide virtual machine functionality. Figure 4 This is a flowchart of a data processing method according to an embodiment of this application, such as... Figure 4 As shown, the process includes the following steps.

[0053] Step S401: Obtain a write request; the write request includes: the target data to be written and the write address information.

[0054] Please see details Figure 2 Step S201 of the illustrated embodiment will not be described again here.

[0055] Step S402: Determine the target data cluster to be used for writing target data based on the write address information.

[0056] Please see details Figure 2 The relevant descriptions of step S202 in the illustrated embodiment will not be repeated here.

[0057] In some optional implementations, step S402, "determining the target data cluster for writing target data based on the write address information," may include steps a1 to a2.

[0058] Step a1: Perform address mapping based on the written address information to determine the first data cluster that has a mapping relationship with the written address information.

[0059] Step a2: If the first data cluster is a snapshot data cluster of the first data write, allocate a new second data cluster and use the second data cluster as the target data cluster.

[0060] In this embodiment, for QCOW2 format files, an address mapping table is provided, specifically including an L1 table and an L2 table. For the write address information in a write request, address mapping can be performed by querying the address mapping table, thereby determining the data cluster that directly has a mapping relationship with the write address information, i.e., the first data cluster.

[0061] For example, the written address information includes the client offset address. Based on the client offset address, the address mapping table can be queried to determine the first data cluster that has a mapping relationship with the client offset address.

[0062] For the first data cluster, it can be determined whether it is the first time data has been written to it. For example, the L2 table records the metadata information of the corresponding data cluster, which includes a flag indicating whether data has been written to the data cluster (e.g., whether storage space has been allocated). If the flag indicates that no data has been written, it means that the first data cluster is the first time data has been written to it.

[0063] Furthermore, a portion of the data clusters is used to record snapshots. These snapshot data clusters are typically read-only. If the first data cluster is the first time data is written to it, and it is a snapshot data cluster, then data cannot be written directly to the first data cluster. A new data cluster, namely the second data cluster, needs to be allocated. This second data cluster is used to write the target data; that is, the second data cluster can be used as the target data cluster.

[0064] The newly allocated second data cluster can be the data cluster of another file, which also has a corresponding address mapping table. When it is necessary to perform read and write operations on the second data cluster, the second data cluster can be located based on the address mapping table.

[0065] Furthermore, if the first data cluster is not being written to for the first time, it can be directly used as the target data cluster.

[0066] If the first data cluster is a data cluster unrelated to the snapshot data cluster, that is, the first data cluster is a data cluster in a non-snapshot scenario, then regardless of whether it is the first time data is written, the first data cluster can be used as the target data cluster.

[0067] Figure 5 This illustrates a schematic diagram of logical partitioning space management for data clusters, such as... Figure 5 As shown, the data cluster is divided into multiple logical subclusters. If the size of each logical subcluster is 2KB, then the 64KB data cluster contains a total of 32 logical subclusters, namely logical subclusters S0~S10. 31The L2 table contains an entry that records the entry offset address of the data cluster, which is used to locate the data cluster. The L2 table may also include a sub-cluster bitmap of the data cluster, such as the sub-cluster bitmap itself or the storage address of the sub-cluster bitmap, which will be described later.

[0068] Step S403: Determine the starting logical sub-cluster and the ending logical sub-cluster based on the write address information; the target data cluster includes multiple logical sub-clusters, the starting logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the starting position for writing target data, and the ending logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the ending position for writing target data.

[0069] Please see details Figure 2 The relevant descriptions of step S202 in the illustrated embodiment will not be repeated here.

[0070] In some alternative implementations, step S403, "determining the starting logical subcluster and the ending logical subcluster," may include steps b1 to b2.

[0071] Step b1: Determine the starting write offset address corresponding to the starting position for writing the target data and the ending write offset address corresponding to the ending position for writing the target data based on the write address information.

[0072] In this embodiment, the offset address within the data cluster is used to represent the start and end positions of the target data being written. Specifically, based on the write address information, the intra-cluster offset address corresponding to the start position in the target data cluster, i.e., the start write offset address, can be determined; and the intra-cluster offset address corresponding to the end position in the target data cluster, i.e., the end write offset address, can also be determined.

[0073] For example, if the write address information includes the client offset address and the data size, the corresponding host offset address can be determined based on the client offset address, and the starting write offset address can be determined based on the host offset address and the data cluster size. Similarly, the ending write offset address can be determined using the data size.

[0074] Optionally, the starting write offset address is: The end of the write operation is at the following offset address: .

[0075] in, Indicates the starting offset address for writing. This indicates the host machine offset address determined based on the written address information. Indicates the data cluster size (e.g., 64KB). This indicates the end of writing at the offset address. Indicates the length of the target data (which can represent the size of the target data). This indicates the modulo operation.

[0076] In this embodiment, the host machine offset address is determined based on the written address information. (It is a starting address) then, based on the data cluster size By performing a modulo operation on the result, the intra-cluster offset address corresponding to the starting position can be determined, i.e., the starting write offset address. Similarly, since the cluster offset address starts from 0, the offset address is written after the end is determined. At this time, it is necessary to check the data length. Subtract one, then perform a modulo operation to accurately determine the end-of-write offset address. .

[0077] Step b2: Take the logical sub-cluster corresponding to the starting write offset address in the target data cluster as the starting logical sub-cluster, and take the logical sub-cluster corresponding to the ending write offset address as the ending logical sub-cluster.

[0078] Each logical sub-cluster within the target data cluster corresponds to a specific offset address range. The starting write offset address is then determined. Then, the offset address can be written based on this starting point. The offset address range within which the write operation falls determines its corresponding logical subcluster, i.e., its starting logical subcluster. Similarly, the ending write offset address can also be determined. The corresponding ending logic subcluster.

[0079] Figure 6 This diagram illustrates a partitioning of logical subclusters, which is related to... Figure 5 The division method is the same. For example... Figure 6 As shown, the data cluster is logically divided into 32 logical subclusters, including logical subclusters S0~S10. 31 Among them, S i This refers to the logical sub-cluster with index i within the cluster, where i = 0, 1, 2, ..., 31.

[0080] Figure 6 The diagram illustrates one case of start and end write offset addresses, such as... Figure 6 As shown, the starting write offset address is a certain address in logical sub-cluster S1, so logical sub-cluster S1 is the starting logical sub-cluster; similarly, the ending write offset address is a certain address in logical sub-cluster S4, so logical sub-cluster S4 is the ending logical sub-cluster.

[0081] It can be understood that the region between the start write offset address and the end write offset address ( Figure 6The gray area in the image is the writing area used to write the target data.

[0082] Step S404: Perform a preset write operation on the target logical sub-cluster; the target logical sub-cluster includes the logical sub-cluster in the starting logical sub-cluster and the logical sub-cluster in the ending logical sub-cluster where no data has been written.

[0083] When performing additional pre-defined write operations on the starting and ending logical subclusters, pre-defined write operations can be performed on the entire logical subcluster; however, due to the starting position of the target data write (starting write offset address) ) and end position (end write offset address) The data may not be the boundary between two logical subclusters. Therefore, if a pre-written operation is performed on the entire logical subcluster, it may result in redundant writing of data in a certain area. It is necessary to perform a pre-written operation on the entire logical subcluster first, and then write the target data to ensure the accuracy of the data cluster.

[0084] In this embodiment, additional preset write operations are only performed on the necessary areas within the target logical cluster. This not only avoids redundant data writing but also does not restrict the order of data writing. In other words, there is no order restriction between performing preset write operations on logical subclusters and writing target data.

[0085] Specifically, step S404, "perform a preset write operation on the target logical sub-cluster", includes steps S4041 to S4044.

[0086] Step S4041: When the starting logical sub-cluster is writing data for the first time, determine the first boundary offset address; the first boundary offset address is the offset address corresponding to the boundary between the starting logical sub-cluster and the previous logical sub-cluster.

[0087] For the initial logical subcluster, if it is the first time data is written to it, then it is a target logical subcluster, requiring additional pre-defined write operations. Specifically, the previous logical subcluster adjacent to it can be determined, and the boundary between them can be defined. The intra-cluster offset address corresponding to this boundary is the first boundary offset address. Wherein, if the initial logical subcluster is the first logical subcluster (e.g., ...), Figure 6 If the logical sub-cluster S0 is in the target data cluster, then the first boundary offset address is the starting address of the target data cluster, which is 0.

[0088] Optionally, the first cluster index corresponding to the starting logical sub-cluster can be determined. Based on the index within the first cluster The first boundary offset address is calculated. Specifically, the first boundary offset address is: .

[0089] in, Indicates the first boundary offset address, This represents the index within the first cluster corresponding to the initial logical subcluster. Indicates the size of the logical subcluster (e.g., 2KB).

[0090] First cluster intra-index Specifically, it can be: .in, Indicates the starting offset address for writing. This indicates rounding down to the nearest integer.

[0091] like Figure 6 As shown, if the size of each logical sub-cluster is 2KB, that is... Write the starting offset address Size of logical subclusters Rounding down the ratio gives the index within the first cluster. . Figure 6 In this context, the initial logical subcluster is logical subcluster S1, and correspondingly, the index within the first cluster is... Furthermore, the first boundary offset address .

[0092] Step S4042: Perform a preset write operation on the region between the first boundary offset address and the starting write offset address; the starting write offset address is the offset address corresponding to the starting position of the target data in the target data cluster.

[0093] When performing additional pre-defined write operations on the initial logical subcluster, only the first boundary offset address is considered. With the starting write offset address The preset write operation is performed in the area between them.

[0094] like Figure 6 As shown, the starting logical sub-cluster is logical sub-cluster S1, where the area between the first boundary offset address and the starting write offset address is a COW area. At this time, only the preset write operation needs to be performed on the COW area.

[0095] It's understandable that the COW region is less than 2KB; in special cases, the first boundary offset address and the starting write offset address can be the same, meaning the COW region is empty. In other words, the amount of data processed by the additional write operation is less than 2KB, and its minimum can be 0.

[0096] Step S4043: If the ending logical sub-cluster is the first time data is written, determine the second boundary offset address; the second boundary offset address is the offset address corresponding to the boundary between the ending logical sub-cluster and the next logical sub-cluster.

[0097] Step S4044: Perform a preset write operation on the region between the end write offset address and the second boundary offset address; the end write offset address is the offset address corresponding to the end position of the target data in the target data cluster used for writing target data.

[0098] Similar to the aforementioned starting logical subcluster, if the ending logical subcluster is the first data written to it, then this ending logical subcluster is also a target logical subcluster. Specifically, the next logical subcluster adjacent to this ending logical subcluster can be determined, and the boundary between them can be defined. The intra-cluster offset address corresponding to this boundary is the second boundary offset address. Wherein, if the ending logical subcluster is the last logical subcluster (e.g., ...), ... Figure 6 Logical subclusters S in 31 If the second boundary offset address is 64×1024 – 1 = 65535, then the second boundary offset address is the end address of the target data cluster.

[0099] Optionally, the intra-cluster index corresponding to the ending logical sub-cluster can be determined. Based on the index within this second cluster The second boundary offset address is calculated. Specifically, the second boundary offset address is: .

[0100] Among them, the second cluster internal index Specifically, it can be: ; This indicates the end of writing to the offset address.

[0101] Furthermore, when performing additional pre-defined write operations on the end logical subcluster, only the end write offset address is specified. offset address of the second boundary The area between the two operations is used to perform a pre-defined write operation.

[0102] like Figure 6 As shown, if the ending logical subcluster is logical subcluster S4, then the index within the second cluster... Correspondingly, the second boundary offset address Furthermore, the region between the end-of-write offset address and the second boundary offset address is another COW region, and only the preset write operation needs to be performed on this COW region.

[0103] In summary, after receiving a write request, at most two additional pre-defined write operations need to be performed on the COW regions. Furthermore, if the start write offset address and the end write offset address are located in the middle of the logical sub-cluster, the COW region will be less than 2KB. In this case, the amount of data for the additional write operations can be further reduced.

[0104] And, as Figure 6As shown, the two COW regions are adjacent to but do not overlap with the target data writing region, so there is no redundant data writing; and it is possible to write data to the COW region first or to write data to the writing region first.

[0105] In some optional implementations, the preset write operation may specifically be a zeroing operation or an operation to copy data to another data cluster. The above step S404, "perform a preset write operation on the target logical sub-cluster," may include steps c1 to c2.

[0106] Step c1: If the target data cluster is related to the first data cluster and the first data cluster is a snapshot data cluster, determine the intra-cluster index corresponding to the target logical sub-cluster; copy the data of the logical sub-cluster corresponding to the intra-cluster index in the first data cluster to the target logical sub-cluster in the target data cluster.

[0107] In this embodiment, the target data cluster may be related to a snapshot data cluster or may be unrelated to any snapshot data cluster. As shown in steps a1 to a2 above, a data cluster A is determined through address mapping, which is the first data cluster. If data cluster A is the first data to be written and it is a snapshot data cluster, a new data cluster B will be reallocated, and this data cluster B will be used as the target data cluster. In this case, the target data cluster (i.e., data cluster B) is reallocated based on snapshot data cluster A, so it is a data cluster related to snapshot data cluster (i.e., data cluster A).

[0108] Furthermore, if a new write request is subsequently received, and the write address in that request is mapped to data cluster B, and this data cluster B is not being written to for the first time during the current write request processing (data was already written during the previous write request), then data cluster B is directly used as the target data cluster. In this case, the target data cluster (i.e., data cluster B) is still the data cluster associated with the snapshot data cluster (i.e., data cluster A).

[0109] For the target data cluster mentioned above (e.g., data cluster B mentioned above), if there is a target logical sub-cluster (including the starting logical sub-cluster and / or the ending logical sub-cluster) where data is written for the first time, the intra-cluster index of the target logical sub-cluster within the target data cluster can be determined, and the data to be copied can be located based on the intra-cluster index and copied to the target logical sub-cluster.

[0110] If the target logical subcluster includes the starting logical subcluster, then the first cluster-in-cluster index of the starting logical subcluster can be determined. And the index within the first data cluster (e.g., data cluster A) The data in the corresponding data cluster is copied to the index within the first cluster of the target data cluster (e.g., data cluster B). The corresponding data cluster is copied to the starting logical subcluster. Similarly, if the target logical subcluster includes the ending logical subcluster, the second cluster-in-index of the ending logical subcluster can be determined. And perform data replication on the final logical subcluster.

[0111] It is understandable that the data in the first data cluster can be copied in accordance with steps S4041 to S4044. For example, as... Figure 6 As shown, for the COW region between the first boundary offset address and the starting write offset address, the data in the COW region of the first data cluster can be read and written to the COW region of the target data cluster. This process is called COW operation.

[0112] Step c2: If the target data cluster is unrelated to the snapshot data cluster, write preset data to the target logical sub-cluster; wherein, the target data cluster is a data cluster that has a mapping relationship with the write address information, such as the first data cluster directly determined in step a1.

[0113] If the target data cluster is unrelated to the snapshot data cluster, meaning the target data cluster is a data cluster from a non-snapshot scenario, and if there exists a target logical sub-cluster where data is being written for the first time, then a COW operation is not required. Instead, preset data is directly written to the target logical sub-cluster; for example, a zeroing operation can be performed, i.e., writing 0.

[0114] Similarly, it can also be applied only to Figure 6 The COW region shown is zeroed out to reduce the amount of data written.

[0115] After determining the first data cluster that has a mapping relationship with the write address information, it can be first determined whether it is related to the snapshot data cluster (i.e., whether it is a snapshot scenario). If the first data cluster is not related to the snapshot data, that is, it is not a snapshot scenario (for example, the first data cluster itself is not a snapshot data cluster, nor is it a newly allocated data cluster when writing data to a snapshot data cluster), then step c2 is executed.

[0116] In a snapshot scenario, it's necessary to further determine which data cluster is the target data cluster. This involves determining if the first data cluster can be used as the target cluster. For example, it can be determined whether the first data cluster is the first data cluster to be written to. If it is, a new second data cluster (e.g., data cluster B mentioned above) needs to be allocated to the first data cluster (e.g., data cluster A above), and this second data cluster will be used as the target data cluster. If the first data cluster is not the first data cluster to be written to, it can be directly used as the target data cluster.

[0117] Step S405: Write the target data to the target data cluster.

[0118] Please see details Figure 2Step S204 of the illustrated embodiment will not be described again here.

[0119] In some alternative implementations, the target data cluster includes a sub-cluster bitmap that records whether data has been written to each logical sub-cluster in the target data cluster.

[0120] Furthermore, the method also includes: after writing the target data to the target data cluster, marking the start logical sub-cluster, the end logical sub-cluster, and other logical sub-clusters between the start logical sub-cluster and the end logical sub-cluster in the sub-cluster bitmap as written data.

[0121] In this embodiment, for each data cluster, a bitmap is provided to represent the state of logical sub-clusters, i.e., a sub-cluster bitmap. The sub-cluster bitmap includes multiple bits, each of which corresponds one-to-one with a logical sub-cluster and is used to indicate whether data has been written to it. Based on the sub-cluster bitmap, it is possible to determine which logical sub-clusters in the data cluster are writing data for the first time.

[0122] If, during the processing of this write request, there exists a starting logical subcluster and / or an ending logical subcluster for the first write operation, the state of all involved logical subclusters can be updated. This includes the starting and ending logical subclusters, as well as other logical subclusters between them. By updating the state of the corresponding bits in the subcluster bitmap (e.g., changing from 0 to 1), it becomes easy to determine whether a logical subcluster is experiencing its first write operation by subsequently querying the subcluster bitmap.

[0123] Figure 5 This diagram illustrates a sub-cluster bitmap. If the data cluster is divided into 32 logical sub-clusters, the sub-cluster bitmap is 32 bits long, with all initial values ​​set to 0. If the write region and COW region are as follows... Figure 6 As shown, it can be determined that the positions corresponding to the starting logic sub-cluster S1, the ending logic sub-cluster S4, and other logic sub-clusters S2 and S3 between the starting and ending logic sub-clusters need to be set to 1. The updated sub-cluster bitmap can be shown as follows. Figure 5 As shown.

[0124] Optionally, the starting logical sub-cluster and the ending logical sub-cluster can be determined based on the sub-cluster bitmap to determine whether data is being written for the first time. Accordingly, the method further includes steps d1 to d2.

[0125] Step d1 involves determining whether the starting and ending logical subclusters have been written with data based on the subcluster bitmap of the target data cluster. This subcluster bitmap is used to record whether each logical subcluster in the target data cluster has been written with data.

[0126] Step d2: If no data is written to the starting logical subcluster, the starting logical subcluster is taken as a target logical subcluster; if no data is written to the ending logical subcluster, the ending logical subcluster is taken as a target logical subcluster.

[0127] After determining the starting and ending logical subclusters, their intra-cluster indices—the first and second intra-cluster indices—can be determined. By querying the subcluster bitmap using these two indices, the corresponding values ​​for the two logical subclusters can be identified. If the value corresponding to the starting logical subcluster is 0, it indicates that no data has been written to it, meaning this is the first time data has been written, and therefore the starting logical subcluster can be considered a target logical subcluster. If the value corresponding to the starting logical subcluster is 1, it indicates that data has already been written to it, and it is not a target logical subcluster. A similar process is performed on the ending logical subcluster, which will not be elaborated further.

[0128] The following example, using a snapshot-based COW operation, illustrates one implementation flow of this data processing method. Figure 7 As shown, it includes the following steps.

[0129] Step S701, Initialize configuration.

[0130] When the system starts, it reads the QCOW2 header information, which includes the data cluster size (e.g., 64KB), L1 table, L2 table, etc. The number of logical subclusters can be configured; for example, if the number is 32, the size of each logical subcluster is: data cluster size / 32, which is 2KB. Furthermore, the metadata structure of the data cluster can be initialized, including the subcluster bitmap.

[0131] In step S702, the virtual machine initiates a write request.

[0132] The write request specifically includes: the target data to be written, the size (len) of the target data, and the client offset address. .

[0133] Step S703: Map the client offset address to the host offset address and determine the first data cluster with the mapping relationship.

[0134] For example, the QCOW2 layer will offset the client address. Mapped to host machine offset address .

[0135] Step S704: Determine whether the first data cluster is the first time data has been written. If yes, continue to step S705; otherwise, continue to step S706.

[0136] For example, in a snapshot scenario, it can be determined whether the first data cluster belongs to an existing data cluster in order to determine whether a COW operation needs to be performed.

[0137] Step S705: Allocate a new second data cluster and use it as the target data cluster.

[0138] Step S706: Use the first data cluster as the target data cluster.

[0139] Step S707: Calculate the start write offset address, end write offset address, and corresponding first cluster index and second cluster index of the write region.

[0140] For example, as shown above, the starting write offset address is: The end of the write operation is at the following offset address: .

[0141] The index within the first cluster is: The index within the second cluster is: .

[0142] The first cluster index is the index of the starting logical subcluster, and the second cluster index is the index of the ending logical subcluster.

[0143] Step S708: Determine the target logical subcluster for the first data write.

[0144] Specifically, based on the first cluster index and the second cluster index, the sub-cluster bitmap of the target logical sub-cluster can be queried to determine whether the starting logical sub-cluster and the ending logical sub-cluster are the first to write data, and the logical sub-cluster that is the first to write data is taken as the target logical sub-cluster.

[0145] It is understandable that if the first data cluster is the first time data is written, then for the newly allocated second data cluster, each of its logical sub-clusters is the first time data is written.

[0146] Step S709: Determine the sub-cluster boundaries of the target logical sub-cluster, and the corresponding COW regions.

[0147] For the initial logical subcluster, its subcluster boundary is the offset address of the first boundary: For the terminating logical subcluster, its subcluster boundary is the second boundary offset address: A schematic diagram of the two COW regions can be seen as follows: Figure 6 As shown.

[0148] Step S710: Copy the data in the COW region of the first data cluster to the COW region of the target data cluster; write the target data to the write region.

[0149] Step S711: Update the sub-cluster bitmap of the target data cluster.

[0150] The data processing method provided in this embodiment logically divides the data cluster into multiple logical sub-clusters. Based on these logical sub-clusters, fine-grained management can be achieved, reducing the data volume of COW operations or zeroing operations to less than the size of two logical sub-clusters, for example, from 64KB to less than 4KB. This reduces I / O load by more than 90%, especially in high-concurrency scenarios, effectively reducing IOPS (Input / Output Operations Per Second) and significantly improving snapshot performance. Furthermore, it does not change the QCOW2 disk format, requires no migration of existing images, has strong compatibility, can be used as an enhancement patch for cloud servers and other platforms, and is easy to integrate.

[0151] This embodiment also provides a data processing apparatus for implementing the above embodiments and preferred embodiments; details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that performs a predetermined function. Although the apparatus described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.

[0152] This embodiment provides a data processing device, such as... Figure 8 As shown, the device includes: The acquisition module 801 is used to acquire a write request; the write request includes: target data to be written and write address information; The determining module 802 is used to determine the target data cluster for writing the target data according to the write address information, and to determine the starting logical sub-cluster and the ending logical sub-cluster; the target data cluster includes multiple logical sub-clusters, the starting logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the starting position for writing the target data, and the ending logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the ending position for writing the target data; Processing module 803 is used to perform a preset write operation on the target logical sub-cluster; the target logical sub-cluster includes the logical sub-cluster in the starting logical sub-cluster and the logical sub-cluster in the ending logical sub-cluster in which no data has been written. Operation module 804 is used to write the target data into the target data cluster.

[0153] In some optional implementations, determining the target data cluster for writing the target data based on the write address information includes: Based on the written address information, an address mapping is performed to determine a first data cluster that has a mapping relationship with the written address information; If the first data cluster is a snapshot data cluster that is the first data to be written, a new second data cluster is allocated and the second data cluster is used as the target data cluster.

[0154] In some optional implementations, performing a preset write operation on the target logical subcluster includes: In the case where the target data cluster is related to the first data cluster and the first data cluster is a snapshot data cluster, the intra-cluster index corresponding to the target logical sub-cluster is determined; the data of the logical sub-cluster corresponding to the intra-cluster index in the first data cluster is copied to the target logical sub-cluster in the target data cluster; When the target data cluster is unrelated to the snapshot data cluster, preset data is written to the target logical sub-cluster; wherein, the target data cluster is a data cluster that has a mapping relationship with the write address information.

[0155] In some optional implementations, performing a preset write operation on the target logical subcluster includes: When the starting logical subcluster is writing data for the first time, a first boundary offset address is determined; the first boundary offset address is the offset address corresponding to the boundary between the starting logical subcluster and the previous logical subcluster. A preset write operation is performed on the region between the first boundary offset address and the starting write offset address; the starting write offset address is the offset address corresponding to the starting position in the target data cluster used for writing the target data. In the case that the ending logical sub-cluster is the first time data is written, a second boundary offset address is determined; the second boundary offset address is the offset address corresponding to the boundary between the ending logical sub-cluster and the next logical sub-cluster. A preset write operation is performed on the region between the end write offset address and the second boundary offset address; the end write offset address is the offset address corresponding to the end position of writing the target data in the target data cluster.

[0156] In some optional implementations, determining the start logical subcluster and the end logical subcluster includes: Based on the write address information, determine the starting write offset address corresponding to the starting position for writing the target data, and the ending write offset address corresponding to the ending position for writing the target data; The logical sub-cluster corresponding to the starting write offset address in the target data cluster is taken as the starting logical sub-cluster, and the logical sub-cluster corresponding to the ending write offset address is taken as the ending logical sub-cluster.

[0157] In some optional implementations, the starting write offset address is: ; The end-of-write offset address is: ; in, This indicates the starting write offset address. This indicates the host machine offset address determined based on the written address information. Indicates the size of the data cluster. This indicates the end of the write offset address. Indicates the length of the target data. This indicates the modulo operation.

[0158] In some optional implementations, the first boundary offset address is: ; The second boundary offset address is: ; in, This represents the first boundary offset address. This represents the index within the first cluster corresponding to the initial logical sub-cluster. Indicates the size of the logical subcluster. This represents the second boundary offset address. This indicates the index within the second cluster corresponding to the ending logical sub-cluster.

[0159] In some optional implementations, the index within the first cluster is: The index within the second cluster is: ; in, This indicates the starting write offset address. This indicates the end of the write offset address. This indicates rounding down to the nearest integer.

[0160] In some optional embodiments, the processing module 803 is further configured to: The starting logical sub-cluster and the ending logical sub-cluster are determined based on the sub-cluster bitmap of the target data cluster. The sub-cluster bitmap is used to record whether each logical sub-cluster in the target data cluster has been written with data. If no data is written to the starting logical subcluster, the starting logical subcluster is taken as a target logical subcluster; if no data is written to the ending logical subcluster, the ending logical subcluster is taken as a target logical subcluster.

[0161] In some optional embodiments, the processing module 803 is further configured to: After the target data is written to the target data cluster, the starting logical sub-cluster, the ending logical sub-cluster, and other logical sub-clusters between the starting logical sub-cluster and the ending logical sub-cluster in the sub-cluster bitmap are marked as data that has been written.

[0162] The data processing apparatus provided in this disclosure can execute the data processing method provided in any embodiment of this disclosure, and has the corresponding functional modules and beneficial effects for executing the method. Further functional descriptions of the various modules and units described above are the same as in the corresponding embodiments described above, and will not be repeated here.

[0163] Figure 9 This is a schematic diagram of the structure of an electronic device provided in an embodiment of this application.

[0164] The following is a detailed reference. Figure 9 This diagram illustrates a suitable structural schematic for implementing the electronic device described in the embodiments of this application. The electronic device may include a processor (e.g., a central processing unit, graphics processor, etc.) 901, which can perform various appropriate actions and processes according to a program stored in read-only memory (ROM) 902 or a program loaded from memory 908 into random access memory (RAM) 903. The RAM 903 also stores various programs and data required for the operation of the electronic device. The processor 901, ROM 902, and RAM 903 are interconnected via a bus 904. An input / output (I / O) interface 905 is also connected to the bus 904.

[0165] Typically, the following devices can be connected to I / O interface 905: input devices 906 including, for example, touchscreens, touchpads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; output devices 907 including, for example, liquid crystal displays (LCDs), speakers, vibrators, etc.; memory devices 908 including, for example, magnetic tapes, hard disks, etc.; and communication devices 909. Communication device 909 allows electronic devices to exchange data via wireless or wired communication with other devices. Although Figure 9 Electronic devices with various devices are shown, but it should be understood that it is not required to implement or have all of the devices shown, and more or fewer devices may be implemented or have instead.

[0166] Specifically, according to embodiments of this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this application include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device 909, or installed from a memory 908, or installed from a ROM 902. When the computer program is executed by the processor 901, it performs the functions defined in the data processing method of the embodiments of this application.

[0167] Figure 9The electronic device shown is merely an example and should not impose any limitation on the functionality and scope of use of the embodiments of this application.

[0168] This application also provides a computer-readable storage medium. The methods described in this application can be implemented in hardware or firmware, or implemented as recordable on a storage medium, or implemented as computer code downloaded over a network and originally stored on a remote storage medium or a non-transitory machine-readable storage medium and then stored on a local storage medium. Thus, the methods described herein can be processed by software stored on a storage medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware. The storage medium can be a magnetic disk, optical disk, read-only memory, random access memory, flash memory, hard disk, or solid-state drive, etc.; further, the storage medium can also include combinations of the above types of memory. It is understood that computers, processors, microprocessor controllers, or programmable hardware include storage components capable of storing or receiving software or computer code. When the software or computer code is accessed and executed by the computer, processor, or hardware, the data processing methods shown in the above embodiments are implemented.

[0169] A portion of this application can be applied as a computer program product, such as computer program instructions, which, when executed by a computer, can invoke or provide the methods and / or technical solutions according to this application through the operation of the computer. Those skilled in the art will understand that the forms in which computer program instructions exist in a computer-readable medium include, but are not limited to, source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include, but are not limited to: the computer directly executing the instructions, or the computer compiling the instructions and then executing the corresponding compiled program, or the computer reading and executing the instructions, or the computer reading and installing the instructions and then executing the corresponding installed program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium accessible to a computer.

[0170] Although embodiments of this application have been described in conjunction with the accompanying drawings, those skilled in the art can make various modifications and variations without departing from the spirit and scope of this application, and all such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A data processing method, characterized in that, The method includes: Obtain a write request; the write request includes: the target data to be written and write address information; The target data cluster for writing the target data is determined based on the write address information, and the start logical sub-cluster and the end logical sub-cluster are determined; the target data cluster includes multiple logical sub-clusters, the start logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the start position for writing the target data, and the end logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the end position for writing the target data; Perform a preset write operation on the target logical sub-cluster; the target logical sub-cluster includes the logical sub-cluster in the starting logical sub-cluster and the logical sub-cluster in the ending logical sub-cluster in which no data has been written. Write the target data into the target data cluster.

2. The method according to claim 1, characterized in that, The step of determining the target data cluster for writing the target data based on the write address information includes: Based on the written address information, an address mapping is performed to determine a first data cluster that has a mapping relationship with the written address information; If the first data cluster is a snapshot data cluster that is the first data to be written, a new second data cluster is allocated and the second data cluster is used as the target data cluster.

3. The method according to claim 2, characterized in that, The preset write operation on the target logical sub-cluster includes: In the case where the target data cluster is related to the first data cluster and the first data cluster is a snapshot data cluster, the intra-cluster index corresponding to the target logical sub-cluster is determined; the data of the logical sub-cluster corresponding to the intra-cluster index in the first data cluster is copied to the target logical sub-cluster in the target data cluster; When the target data cluster is unrelated to the snapshot data cluster, preset data is written to the target logical sub-cluster; wherein, the target data cluster is a data cluster that has a mapping relationship with the write address information.

4. The method according to any one of claims 1 to 3, characterized in that, The preset write operation on the target logical sub-cluster includes: When the starting logical subcluster is writing data for the first time, a first boundary offset address is determined; the first boundary offset address is the offset address corresponding to the boundary between the starting logical subcluster and the previous logical subcluster. A preset write operation is performed on the region between the first boundary offset address and the starting write offset address; the starting write offset address is the offset address corresponding to the starting position in the target data cluster used for writing the target data. In the case that the ending logical sub-cluster is the first time data is written, a second boundary offset address is determined; the second boundary offset address is the offset address corresponding to the boundary between the ending logical sub-cluster and the next logical sub-cluster. A preset write operation is performed on the region between the end write offset address and the second boundary offset address; the end write offset address is the offset address corresponding to the end position of writing the target data in the target data cluster.

5. The method according to claim 4, characterized in that, The determination of the starting logical subcluster and the ending logical subcluster includes: Based on the write address information, determine the starting write offset address corresponding to the starting position for writing the target data, and the ending write offset address corresponding to the ending position for writing the target data; The logical sub-cluster corresponding to the starting write offset address in the target data cluster is taken as the starting logical sub-cluster, and the logical sub-cluster corresponding to the ending write offset address is taken as the ending logical sub-cluster.

6. The method according to claim 1, characterized in that, The method further includes: The starting logical sub-cluster and the ending logical sub-cluster are determined based on the sub-cluster bitmap of the target data cluster. The sub-cluster bitmap is used to record whether each logical sub-cluster in the target data cluster has been written with data. If no data is written to the starting logical subcluster, the starting logical subcluster is taken as a target logical subcluster; if no data is written to the ending logical subcluster, the ending logical subcluster is taken as a target logical subcluster.

7. The method according to claim 6, characterized in that, The method further includes: After the target data is written to the target data cluster, the starting logical sub-cluster, the ending logical sub-cluster, and other logical sub-clusters between the starting logical sub-cluster and the ending logical sub-cluster in the sub-cluster bitmap are marked as data that has been written.

8. A data processing apparatus, characterized in that, The device includes: The acquisition module is used to acquire write requests; the write request includes: target data to be written and write address information; The determining module is used to determine the target data cluster for writing the target data based on the write address information, and to determine the starting logical sub-cluster and the ending logical sub-cluster; the target data cluster includes multiple logical sub-clusters, the starting logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the starting position for writing the target data, and the ending logical sub-cluster is the logical sub-cluster in the target data cluster corresponding to the ending position for writing the target data; The processing module is used to perform a preset write operation on the target logical sub-cluster; the target logical sub-cluster includes the logical sub-cluster in the starting logical sub-cluster and the logical sub-cluster in the ending logical sub-cluster in which no data has been written. An operation module is used to write the target data into the target data cluster.

9. An electronic device, characterized in that, include: A memory and a processor are communicatively connected, the memory stores computer instructions, and the processor executes the computer instructions to perform the data processing method of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores computer instructions for causing the computer to perform the data processing method according to any one of claims 1 to 7.

11. A computer program product, characterized in that, Includes computer instructions for causing a computer to perform the data processing method according to any one of claims 1 to 7.