Service system, data writing and reading method, and device, medium and program product
By optimizing the data writing process between the cloud disk client and the storage service node, network bandwidth consumption is reduced, the network bandwidth bottleneck of the storage cluster is solved, and the throughput is improved.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CLOUD INTELLIGENCE ASSETS HOLDING (SINGAPORE) PTE LTD
- Filing Date
- 2025-10-28
- Publication Date
- 2026-06-18
AI Technical Summary
In existing technologies, when cloud computing big data processing systems use Elastic Block Storage (EBS) cloud disks, the network bandwidth of the storage cluster becomes a bottleneck for high throughput, resulting in limited throughput capacity of the storage cluster.
By reducing network bandwidth usage on the storage cluster along the data path, the cloud disk client no longer sends write requests directly to the storage service node. Instead, it sends metadata to the storage service node, which selects a data block replica and obtains its address information. The cloud disk client then writes the data directly to the data storage node.
This reduces the bandwidth amplification required for write requests to the storage cluster, thereby reducing network bandwidth consumption and improving the throughput of the storage cluster.
Smart Images

Figure CN2025130697_18062026_PF_FP_ABST
Abstract
Description
Service systems, data writing and reading methods, devices, media and program products
[0001] This disclosure claims priority to Chinese Patent Application No. 202411844659.5, filed with the China Patent Office on December 13, 2024, entitled “Service System, Data Writing and Reading Method, Apparatus, Media and Program Product”, the entire contents of which are incorporated herein by reference. Technical Field
[0002] This disclosure relates to the field of storage technology, and in particular to a service system, data writing and reading method, device, medium and program product. Background Technology
[0003] With the increasing prevalence and adoption of cloud computing, more and more users are utilizing cloud services for large-scale data processing, data warehouse analysis, and offline computing. These data processing systems typically use cloud disks to store intermediate data for data processing tasks, enabling rapid data access and exchange. Data processing systems, especially big data computing systems, often utilize a large number of cloud disks in parallel.
[0004] Compared to online real-time applications, these big data processing tasks have relatively lower latency requirements, but place higher demands on the throughput of storage clusters, especially when using Elastic Block Storage (EBS) cloud disks, where high throughput becomes a critical requirement. The bottleneck for storage clusters to provide higher cloud disk throughput is usually the network bandwidth limit of the storage cluster. Summary of the Invention
[0005] This disclosure provides a service system, data writing and reading method, device, media, and program product for reducing network bandwidth consumption of the storage cluster on the data path, thereby improving the throughput of the storage cluster.
[0006] This disclosure provides a service system, including: a computing cluster and a storage cluster; the computing cluster is equipped with a cloud disk client; the storage cluster includes: a storage service system and a data storage system; the storage service system includes: multiple storage service nodes; the data storage system includes: multiple data storage nodes and metadata storage nodes;
[0007] The cloud disk client is used to respond to a write request by sending the metadata of the write request to the first storage service node among the plurality of storage service nodes; the first storage service node is the storage service node that serves the first cloud disk to be written by the write request.
[0008] The first storage service node is configured to, in response to the metadata of the write request, select a target number of data block replicas from the first cloud disk for the write request; and request the target number of data block replicas from the metadata storage node;
[0009] The metadata storage node is configured to respond to a request from the first storage service node by returning the address information of the target data storage node where the target number of data block replicas are located to the first storage service node.
[0010] The first storage service node is also configured to send the identifiers of the target number of data block replicas, the address information of the target data storage node, and the data storage method to the cloud disk client;
[0011] The cloud disk client is used to write the data to be written corresponding to the write request to the target data storage node according to the identifier of the target number of data block replicas and the address information of the target data storage node.
[0012] Optionally, the cloud disk client is further configured to: send a read request to a second storage service node among the plurality of storage service nodes; the second storage service node is a storage service node that serves the second cloud disk to be read by the read request;
[0013] The second storage service node is used to respond to the read request by obtaining the address information of the data block to be read by the read request; and returning the address information of the data block to be read to the cloud disk client.
[0014] The cloud disk client is used to read data from the data storage node where the data block to be read is located, based on the address information of the data block to be read.
[0015] This disclosure also provides a data writing method, applicable to cloud disk clients deployed on computing clusters, the method comprising:
[0016] In response to a write request, the metadata of the write request is sent to a first storage service node among multiple storage service nodes. The metadata of the write request is used to trigger the first storage service node to select a target number of data block replicas for the write request, and to return the address information of the target data storage node where the target number of data block replicas are located and the identifier of the target number of data block replicas. The first storage service node is a storage service node that serves the first cloud disk to be written by the write request. The target data storage node belongs to a data storage system. The data storage system further includes a metadata storage node. The address information of the target data storage node is obtained by the metadata storage node.
[0017] Based on the identifiers of the target number of data block replicas and the address information of the target data storage node, the data to be written corresponding to the write request is written to the target data storage node.
[0018] This disclosure also provides a data writing method, applicable to a storage service node in a storage service system; the storage service system belongs to a storage cluster; the storage cluster further includes a data storage system; the data storage system includes: multiple data storage nodes and metadata storage nodes;
[0019] The method includes:
[0020] Retrieve metadata of write requests sent by cloud disk clients in the computing cluster;
[0021] In response to the metadata of the write request, select a target number of data block replicas from the first cloud disk for the write request;
[0022] The target number of data block replicas are requested from the metadata storage node so that the metadata storage node can return the address information of the target data storage node where the target number of data block replicas are located;
[0023] The identifiers of the target number of data block replicas and the address information of the target data storage node are sent to the cloud disk client, so that the cloud disk client can write the data to be written corresponding to the write request to the target data storage node according to the identifiers of the target number of data block replicas and the address information of the target data storage node.
[0024] This disclosure also provides a data reading method, applicable to a cloud disk client in a computing cluster, the method comprising:
[0025] A read request is sent to a second storage service node among multiple storage service nodes; the second storage service node is a storage service node that serves the second cloud disk to be read by the read request; the multiple storage service nodes belong to a storage service system in a storage cluster; the storage cluster also includes a data storage system; the data storage system includes the data storage node where the data block to be read is located;
[0026] Obtain the address information of the data block to be read in the read request returned by the second storage service node;
[0027] Data is read from the data storage node where the data block to be read is located, based on the address information of the data block to be read.
[0028] This disclosure also provides a data reading method, applicable to a storage service node in a storage service system, wherein the storage service system belongs to a storage cluster; the storage cluster further includes a data storage system; the data storage system includes: a data storage node for storing data; the method includes:
[0029] Get read requests sent by cloud disk clients in the computing cluster;
[0030] In response to the read request, the address information of the data block to be read is obtained; the data block to be read is located in the data storage node;
[0031] The address information of the data block to be read is returned to the cloud disk client so that the cloud disk client can read the data from the data storage node where the data block to be read is located based on the address information of the data block to be read.
[0032] This disclosure also provides an electronic device, including: a memory, a processor, and a communication component; wherein the memory is used to store a computer program;
[0033] The processor is coupled to the memory and the communication component and is used to execute the computer program to perform the steps in the aforementioned data writing methods and / or data reading methods.
[0034] This disclosure also provides a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the aforementioned data writing methods and / or data reading methods.
[0035] This disclosure also provides a computer program product, including a computer program that, when executed by one or more processors, causes the one or more processors to perform the steps in the aforementioned data writing methods and / or data reading methods.
[0036] In this embodiment, the cloud disk client no longer sends write requests to the storage service node; instead, it sends the metadata of the write request to the storage service node. The storage service node can respond to the metadata by selecting a target number of data block replicas for the write request and obtain the address information of the data block replicas from the metadata storage node. Then, the storage service node returns the address information of the data block replicas to the cloud disk client. In this way, the cloud disk client can directly write the data to be written to the data storage node based on the address information of the data block replicas. This write method eliminates the need for the storage service node to forward the data to be written, reducing the bandwidth amplification required for the storage cluster to serve write requests, thereby reducing the network bandwidth consumption of the storage cluster by write requests and helping to improve the throughput of the storage cluster. Attached Figure Description
[0037] The accompanying drawings, which are included to provide a further understanding of this disclosure and form part of this disclosure, illustrate exemplary embodiments of the present disclosure and are used to explain the disclosure, but do not constitute an undue limitation of the disclosure. In the drawings:
[0038] Figure 1 is a schematic diagram of the traditional data read and write process;
[0039] Figure 2 is a schematic diagram of the structure of the service system provided in an embodiment of this disclosure;
[0040] Figure 3 is a schematic diagram of the data writing process provided in an embodiment of this disclosure;
[0041] Figure 4 is a schematic diagram of the data reading process provided in an embodiment of this disclosure;
[0042] Figures 5 and 6 are schematic flowcharts of the data writing method provided in the embodiments of this disclosure;
[0043] Figures 7 and 8 are schematic flowcharts of the data reading method provided in the embodiments of this disclosure;
[0044] Figure 9 is a schematic diagram of the structure of the electronic device provided in the embodiment of this disclosure. Detailed Implementation
[0045] To make the objectives, technical solutions, and advantages of this disclosure clearer, the technical solutions of this disclosure will be clearly and completely described below in conjunction with specific embodiments and corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of this disclosure, and not all of them. All other embodiments obtained by those skilled in the art based on the embodiments of this disclosure without creative effort are within the scope of protection of this disclosure.
[0046] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this disclosure are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation portals are provided for users to choose to authorize or refuse.
[0047] The terms and concepts involved in the embodiments of this disclosure will be explained below.
[0048] Elastic Block Storage (EBS): Elastic block storage provides low-latency, persistent, and highly reliable block-level random storage for cloud servers. Block storage supports automatic data replication within availability zones, preventing data unavailability due to unexpected hardware failures and protecting user applications from the threat of hardware failure.
[0049] Chunk: A chunk is a basic unit of data storage, typically used to divide large files or data into smaller, manageable parts. Each chunk represents a logical segment of data in the storage system and can be read and written independently. The size of a chunk can be adjusted according to system configuration and requirements.
[0050] Logical Block Address (LBA): LBA is a general mechanism for describing the block of data located on a computer storage device, typically used in auxiliary memory devices such as hard drives. An LBA can be the address of a specific data block or the data block pointed to by a given address.
[0051] The following provides an exemplary description of the overall architecture and data transmission process of a distributed service system currently supporting cloud disk services. As shown in Figure 1, large-scale cloud computing deployments primarily adopt a storage-compute separation architecture. In this architecture, computing resources and storage resources are managed by independent clusters. Computing cluster 10 consists of multiple computing servers. The computing servers use virtualization technology to divide physical resources into multiple virtual machines (VMs) for user use. Storage cluster 20, on the other hand, consists of dedicated storage servers responsible for persistent data storage. This separation design allows computing and storage to scale independently, improving system flexibility and maintainability.
[0052] On the computing server, virtual machines (VMs) are created using virtualization technology, and logical hard disks (i.e., cloud disks) are created within each VM for user use. User read and write requests within the VMs are forwarded to the backend storage cluster 20 via the cloud disk client 30.
[0053] As shown in Figure 1, the storage cluster 20 is deployed in a layered manner, including: a storage service layer (i.e., storage service system 40) and a persistent storage layer (i.e., data storage system 50). The data storage system 50 includes: multiple data storage nodes 501 and metadata storage nodes (Meta Server, MS) 502. The metadata storage node 502 is primarily responsible for managing and maintaining metadata information related to data storage. The metadata storage node 502 records the specific location information of each data block (Chunk), including the address information of the data storage node 501 where it resides. The data storage nodes 501 are used for persistent storage of specific application data, and data reliability and high availability can be ensured through data redundancy storage. Data redundancy storage can be a multi-replica storage method or an erasure coding storage method.
[0054] The storage service system 40 includes a management node 401 and storage service nodes 402. The management node 401, such as a block master (BM), is the management service at the storage service layer, responsible for managing and scheduling storage service nodes 402, ensuring that data requests are correctly distributed to the appropriate storage service nodes 402. Storage service nodes 402, such as block servers (BS), handle data read / write requests from cloud disk clients and distribute data across multiple data storage nodes 501 in a multi-replica or erasure coding manner.
[0055] The cloud disk client 30 first needs to locate the storage service node 402 serving the cloud disk, such as a block server (BS). This process is typically completed through the management node 401. The management node 401, such as a block master (BM), is the management service at the storage service layer, responsible for managing and scheduling the storage service node 402. Once the storage service node 402 is determined, the cloud disk client 30 caches the mapping relationship between the cloud disk and the storage service node 402, as well as the address information of the storage service node 402. Based on this mapping relationship and the address information of the storage service node 402, the cloud disk client 30 can directly forward subsequent requests to the corresponding storage service node 402.
[0056] Specifically, the user performs write operations on the cloud disk within the VM. As shown in step 1.1 of Figure 1, the write request is sent to the cloud disk client 30 on the compute server via the VM. The cloud disk client 30 determines the target storage service node 402 for serving the cloud disk based on the mapping relationship between the cached cloud disk and the storage service node 402, and in step 1.2, sends the write request to the target storage service node 402. As shown in step 1.3 of Figure 1, after receiving the write request, the target storage service node 402 writes the application data carried in the write request to multiple data storage nodes 501 using multiple copies or erasure coding. The target storage service node 402 records the mapping relationship between the LBA carried in the write request and the data storage location.
[0057] The data reading process mainly includes: A user initiates a read operation on the cloud disk within the VM. As shown in step 2.1 of Figure 1, the read request is sent to the cloud disk client 30 on the compute server via the VM. The cloud disk client determines the target storage service node 402 serving the cloud disk based on the mapping relationship between the cached cloud disk and the storage service node 402, and forwards the read request to the target storage service node 402 as shown in step 2.2 of Figure 1. The target storage service node 402 uses the LBA in the read request to find the mapping relationship between the LBA and the data storage location, and determines the data storage location corresponding to the data to be read. As shown in step 2.3 of Figure 1, the target storage service node 402 reads the data from the data storage node 501 corresponding to the data storage location, and returns the data to the cloud disk client 30 in step 2.4.
[0058] In the physical deployment of the storage cluster, typically one storage service node (402) and one data storage node (501) are deployed on each physical storage server. These physical storage servers are equipped with multiple physical disks, such as solid-state drives (SDDs) or hard disk drives (HDDs), providing high local read and write bandwidth. Multiple physical storage servers are interconnected via a network interface to form a storage cluster. Currently, the total read and write bandwidth of the disks on the physical storage servers is far higher than the network bandwidth of the storage cluster. Especially when serving cloud disks with high throughput requirements, its network bandwidth often reaches saturation before other resources, limiting the overall system throughput.
[0059] Based on the above data read and write process, we will now analyze the network bandwidth required for the distributed service system to complete data read and write requests.
[0060] For a write request, assuming the amount of data sent from cloud disk client 30 to storage cluster 20 is BW, the network bandwidth required for storage service node 402 in the storage service layer to receive this data is BW. Storage service node 402 then groups the data into multiple data fragments using multiple replicas or erasure coding and sends them to multiple data storage nodes 501. Assuming the data redundancy ratio is r, the amount of data in this step becomes BW*r. Therefore, the total network bandwidth required for the write request is BW*(1+r). For multi-replica storage, the data redundancy ratio equals the number of replicas; for example, with 3 replicas, the data redundancy ratio is 3. For erasure coding storage with (M+N) parameters, the data redundancy ratio is (M+N) / M. M represents dividing the data into M data fragments, and N refers to encoding the M data fragments using an erasure coding algorithm to obtain N check blocks. For example, for (6+3) erasure coding storage, the data redundancy ratio is (6+3) / 6 = 1.5.
[0061] For a read request, assuming the amount of data that cloud disk client 30 needs to read from the storage cluster is BR, then the network bandwidth required for storage service node 402 to read data from data storage node 501 is BR, and the network bandwidth required for storage service node 402 to return data to cloud disk client 30 is also BR. Therefore, the total network bandwidth required for the read request is 2*BR.
[0062] Based on the above analysis, under the aforementioned layered data storage path architecture (cloud disk client—storage service layer—persistent storage layer), the network bandwidth under the write path is amplified by (1+r) times, and the network bandwidth under the read path is amplified by 2 times. In scenarios oriented towards throughput-driven loads, due to this amplification effect of network bandwidth, the network bandwidth of the storage cluster becomes the main bottleneck for providing higher throughput capabilities.
[0063] Based on the above analysis, the effect of read / write paths on network bandwidth is mainly due to the layering of the storage cluster (cloud disk client—storage service layer—persistent storage layer). However, layered storage clusters have the following advantages compared to distributed storage systems without layering (i.e., distributed storage systems with cloud disk client—persistent storage layer):
[0064] (1) Reduce system complexity and improve scalability. By dividing the storage system into cloud disk client, storage service layer (including BS and BM) and storage persistence layer, each layer has clear responsibilities and interface definitions, thus reducing the overall system complexity.
[0065] (2) Enterprise-level feature support: The storage service layer can provide advanced features such as snapshots, replication, migration, multi-mounting, and disaster recovery. If these features are placed in the cloud disk client, it will significantly increase the complexity of its deployment and operation. Since the cloud disk client has limited resources on the computing nodes, this will also increase the difficulty of development and performance optimization.
[0066] (3) Load balancing and Quality of Service (QoS) adjustment: The storage service layer, through the scheduling and coordination of storage service nodes (such as BS) and management nodes (such as BM), can achieve load balancing among storage service nodes, ensuring efficient resource utilization. When handling concurrent requests from multiple users, the storage service layer can perform QoS scheduling based on application service requirements to ensure priority processing of critical application services. If this logic were placed in the cloud disk client or persistent storage layer, the scheduling cost and difficulty would increase significantly because the number of cloud disk clients far exceeds the number of storage service nodes.
[0067] (4) Buffer protection and flow control: The storage service layer can reorder, merge, cache hot data, and prefetch a large number of concurrent requests, thereby reducing the impact on the persistent storage layer. In extreme cases, the storage service layer can control the flow through mechanisms such as fast discarding or delayed response to ensure the stability and performance of the system.
[0068] Therefore, the data read / write scheme provided in this embodiment still performs data read / write based on a layered architecture of the storage cluster, minimizing the network bandwidth consumption of the storage cluster. The main idea is as follows: the cloud disk client no longer sends write requests to the storage service node, but instead sends the metadata of the write request to the storage service node; the storage service node can respond to the metadata by selecting a target number of data block replicas for the write request and obtain the address information of the data block replicas from the metadata storage node. Then, the storage service node returns the address information of the data block replicas to the cloud disk client. In this way, the cloud disk client can directly write the data to be written to the data storage node based on the address information of the data block replicas. This data writing method eliminates the need for the storage service node to forward the data to be written, reducing the bandwidth amplification required by the storage cluster to serve write requests, thereby reducing the network bandwidth consumption of the storage cluster by write requests and helping to improve the throughput of the storage cluster.
[0069] The technical solutions provided by the embodiments of this disclosure are described in detail below with reference to the accompanying drawings.
[0070] It should be noted that the same reference numerals in the following figures and embodiments denote the same object or the same step. Therefore, once an object or step is defined in one figure or embodiment, it does not need to be discussed further in subsequent figures and embodiments.
[0071] Figures 2 and 3 are schematic diagrams of the service system provided in the embodiments of this disclosure. As shown in Figures 2 and 3, the service system includes a computing cluster 10 and a storage cluster 20. The computing cluster 10 is equipped with a cloud disk client 30. The storage cluster 20 includes a storage service system 40 and a data storage system 50; the storage service system includes a management node 401 and multiple storage service nodes 402. The data storage system 50 includes multiple data storage nodes 501 and metadata storage nodes 502. For a description of each node, please refer to the relevant content of the foregoing embodiments, which will not be repeated here.
[0072] In this embodiment of the disclosure, to reduce network bandwidth consumption during data read / write processes, as shown in step 1 of Figure 3, the cloud disk client 30 can respond to a write request by sending the metadata of the write request to the first storage service node among multiple storage service nodes, as shown in Figure 3 as the first storage service node 402a. The metadata of the write request may include: the identifier of the first cloud disk carried in the write request, the starting logical block address (LBA) to be written, and the length of the data to be written, etc. The first storage service node 402a refers to the storage service node among the multiple storage service nodes 402 that serves the cloud disk to be written in the write request. For ease of description, the cloud disk to be written in the write request is defined as the first cloud disk.
[0073] Specifically, the cloud disk client 30 determines the address information of the first storage service node serving the first cloud disk, and sends the metadata of the write request to the first storage service node 402a based on the address information of the first storage service node. In some embodiments, the cloud disk client 30 can use the identifier (ID) of the first cloud disk to query the correspondence between the identifiers of cached cloud disks and the address information of storage service nodes. If the identifier of the first cloud disk is found in the correspondence, the storage service node corresponding to the identifier of the first cloud disk in the correspondence is used as the first storage service node serving the first cloud disk. If the identifier of the first cloud disk is not found in the correspondence, a query request carrying the identifier of the first cloud disk is sent to the management node 401.
[0074] The control node 401 stores the correspondence between cloud disk identifiers and storage service nodes. In this correspondence, the storage service node corresponding to the cloud disk identifier is the storage service node serving the corresponding cloud disk. Based on the correspondence between cloud disk identifiers and storage service node address information, the control node 401 can respond to a query request by using the identifier of the first cloud disk to query the correspondence between cloud disk identifiers and storage service node address information to obtain the address information of the storage service node corresponding to the first cloud disk identifier; and use the storage service node corresponding to this address information as the first storage service node serving the first cloud disk. Furthermore, the control node 401 can return the address information of the first storage service node to the cloud disk client 30. Furthermore, the cloud disk client 30 can send the metadata of the write request to the first storage service node 402a based on the address information of the first storage service node.
[0075] In some embodiments, the cloud disk client 30 can generate another write request (defined as a first write request) based on the metadata of the write request. This first write request carries the metadata of the write request but does not contain the data to be written corresponding to the write request. To enable the storage service node 402 to distinguish whether the received write request carries metadata or data to be written, a flag bit can be added to the write request. When generating the first write request, the cloud disk client 30 can write a first identifier indicating that the carried data is metadata into the corresponding flag bit. Correspondingly, when the first storage service node 402a receives the first write request, it can obtain the value of the flag bit; if the value of the flag bit is the first identifier, it is determined that the data carried by the first write request is metadata.
[0076] Since the data received by the first storage service node 402a is the metadata of the write request, and not the data to be written, the first storage service node 402a will not directly write data to the data storage node 501. Instead, as shown in step 2 of Figure 3, "Selecting a target number (Q) of data block replicas", the first storage service node 402a, in response to the metadata of the write request, selects a target number of data block replicas from the first cloud disk for the write request. In some embodiments, the first storage service node 402a may, in response to the metadata of the write request, select a target number (i.e., Q) of data block replicas for the write request based on the number of replicas R required by the preset data storage method, so that the number of selected data block replicas meets the requirements of the data storage method, increasing the probability of successful data writing. In other embodiments, since the first storage service node 402a has a preset data storage method, the number of replicas R required by this data storage method can also be preset. Based on this, the first storage service node 402a can select a target number (i.e., Q) of data block replicas for the write request according to the preset number of replicas R (the number of replicas required by the data storage method), so that the number of selected data block replicas meets the requirements of the data storage method and increases the probability of successful data writing.
[0077] The data storage method is a redundant storage method, which can be either multi-replica storage or erasure coding storage. If the data storage method is a multi-replica storage method with K replicas (K≥2, and K is an integer), then the required number of replicas is K. If the data storage method is an erasure coding storage method with erasure coding parameters (M, N), then the required number of replicas is (M+N). Here, (M, N) represents dividing the data to be written into M data fragments, encoding the M data fragments using an erasure coding algorithm, and obtaining N check blocks. This erasure coding storage method can perform anomaly recovery on the N encoded blocks. The encoded blocks include: data fragments and / or check blocks.
[0078] In some embodiments, the first storage service node 402a may determine the number P of data chunks to be requested in response to the metadata of the write request. In some embodiments, if a default number of data chunks to be written per instance is preset, then the preset number of data chunks to be written per instance is used as the number P of data chunks to be requested.
[0079] In other embodiments, the identifier of the first cloud disk can be obtained from the metadata of the write request; subsequently, the load information of the first cloud disk can be obtained based on the identifier. The load information of the first cloud disk refers to performance indicators reflecting the cloud disk's ability to process requests, and may include: the resource usage of the first cloud disk, the number of read / write requests currently being handled by the first cloud disk and the network bandwidth occupied by these read / write requests, the throughput of the first cloud disk, and the number of read / write requests per second (IOPS), etc. Since the load information of the first cloud disk reflects its ability to process requests, the number of data blocks that the first cloud disk can support for concurrent writing can be determined based on the load information of the first cloud disk, which is taken as the number P of data blocks to be requested. Specifically, by determining the number P of data blocks to be requested based on the cloud disk's load information, the network bandwidth occupied by writing data blocks can be matched with the cloud disk's processing capacity. When the cloud disk's load is high, the number P of data blocks to be requested can be reduced, thereby reducing the amount of data written in a single operation and alleviating the pressure on the cloud disk. When the cloud disk's load is low, the number P of data blocks to be requested can be increased, thereby improving the cloud disk's IOPS and throughput.
[0080] In some embodiments, a pre-defined correspondence between the load of a cloud disk and the number of concurrently written data blocks can be established. Based on this correspondence, the load information of the first cloud disk can be queried to obtain the number of concurrently written data blocks that the first cloud disk can support, which is then used as the number P of data blocks to be requested.
[0081] Furthermore, the first storage service node 402a can determine the target number Q of data block replicas based on the number R required by the preset data storage method and the number P of data blocks to be requested. Specifically, the target number Q of data block replicas can be the product of the number R required by the preset data storage method and the number P of data blocks to be requested. That is, Q = R * P. Here, one data block corresponds to R data block replicas.
[0082] Furthermore, the first storage service node 402a can select a target number of data block replicas (i.e., Q data block replicas) from the first cloud disk for write requests based on the performance parameter information of the data blocks contained in the first cloud disk. The performance parameter information of the data blocks refers to information reflecting the processing request capability of the data blocks, and may include: the data block's idle resource information, the data block's response latency, and the data block's status information. The data block's status information reflects whether the data block has the capability to support data writing, and may include: sealed state and non-sealed state. Data blocks in the sealed state do not have the capability to support data writing; data blocks in the non-sealed state do have the capability to support data writing. The response latency of the data block is determined based on the response latency of the data block to historical write requests.
[0083] Specifically, the first storage service node 402a can obtain data blocks in an unsealed state from the first cloud disk based on the performance parameter information of the data blocks contained in the first cloud disk, and then obtain data blocks with free storage space greater than or equal to the length of the data to be written from the data blocks in the unsealed state. Afterwards, Q data blocks can be selected sequentially from the data blocks with free storage space greater than or equal to the length of the data to be written, in ascending order of response latency, to serve as Q data block replicas.
[0084] Based on the performance parameters of the data blocks, a target number of data block replicas are selected from the first cloud disk for the write request. Data blocks with better performance can be selected for data writing, which helps to improve the speed of data writing and reduce the response latency of the write request.
[0085] After selecting Q data block replicas, the first storage service node 402a, as shown in step 3 of Figure 3, can request these Q data block replicas from the metadata storage node 502, that is, request that the Q data block replicas be allocated to the aforementioned write request. As shown in step 4 of Figure 3, the metadata storage node 502 can respond to the request from the first storage service node 402a by returning the address information of the target data storage node where the Q data block replicas are located to the first storage service node 402a.
[0086] Specifically, when the first storage service node 402a requests the Q data block replicas from the metadata storage node 502, it can send the identifiers of the Q data block replicas to the metadata storage node 502. The identifier of the data block replica is the identifier of the corresponding data block in the cloud disk. The metadata storage node 502 can obtain the address information of the target data storage node corresponding to each of the Q data block replicas from the pre-stored correspondence between the data block identifiers and the address information of the data storage nodes, and return the address information of the target data storage nodes containing the Q data block replicas to the first storage service node 402a.
[0087] Furthermore, as shown in step 5 of Figure 3, the first storage service node 402a can send the identifiers of the Q data block replicas and the address information of the target data storage node where the Q data block replicas are located to the cloud disk client 30. Thus, as shown in step 6 of Figure 3, "Write the data to be written", the cloud disk client 30 can write the data to be written corresponding to the write request to the target data storage node according to the identifiers of the Q data block replicas and the address information of the target data storage node where the Q data block replicas are located.
[0088] In some embodiments, the first storage service node 402a may also send a preset data storage method to the cloud disk client 30. The cloud disk client 30 may, based on the identifiers of the Q data block replicas and the address information of the target data storage node where the Q data block replicas are located, write the data to be written corresponding to the write request to the target data storage node according to the preset data storage method. Specifically, the first storage service node 402a may encapsulate the identifiers of the aforementioned Q data block replicas, the address information of the target data storage node where the Q data block replicas are located, and the preset data storage method in the same response message and send it to the cloud disk client 30.
[0089] In other embodiments, the cloud disk client 30 is pre-configured with a data storage method. This data storage method is the same as the data storage method pre-configured in the first storage service node 402a. Accordingly, the cloud disk client 30 can write the data to be written corresponding to the write request to the target data storage node according to the identifiers of the Q data block replicas and the address information of the target data storage node where the Q data block replicas are located, in accordance with the pre-configured data storage method.
[0090] Specifically, the first storage service node 402a can obtain the starting LBA to be written from the metadata of the write request; and determine the target data block offset corresponding to the starting LBA according to the first correspondence between the pre-stored LBA and the data block offset. Specifically, the first storage service node 402a can query the first correspondence using the starting LBA and use the data block offset corresponding to the starting LBA in the first correspondence as the target data block offset. Further, the first storage service node 402a can send the target data block offset to the cloud disk client 30. Specifically, the first storage service node 402a can encapsulate the identifiers of the aforementioned Q data block replicas, the address information of the target data storage node where the Q data block replicas are located, and the preset data storage method, along with the target data block offset, in the same response message and send it to the cloud disk client 30.
[0091] Accordingly, the cloud disk client 30 can write the data to be written to the target data storage node according to the aforementioned data storage method, based on the identifiers of the Q data block replicas, the offset within the target data block, and the address information of the target data storage node where the Q data block replicas are located.
[0092] Specifically, the cloud disk client 30 can determine the data block replicas from the target data storage node based on the identifiers of the Q data block replicas and the address information of the target data storage node where the Q data block replicas are located; and write the data to be written into the data block replica starting from the offset within the target data block of the data block replica.
[0093] In this embodiment, the cloud disk client no longer sends write requests to the storage service node; instead, it sends the metadata of the write request to the storage service node. The storage service node can respond to the metadata by selecting a target number of data block replicas for the write request and obtain the address information of the data block replicas from the metadata storage node. Then, the storage service node returns the address information of the data block replicas to the cloud disk client. In this way, the cloud disk client can directly write the data to be written to the data storage node based on the address information of the data block replicas. This write method eliminates the need for the storage service node to forward the data to be written, reducing the bandwidth amplification required for the storage cluster to serve write requests, thereby reducing the network bandwidth consumption of the storage cluster by write requests and helping to improve the throughput of the storage cluster.
[0094] The network bandwidth required for the data writing method provided in this embodiment is analyzed and explained below. For a write request, assuming the amount of data to be written sent from the cloud disk client to the storage cluster is BW, and the data redundancy ratio of the preset data storage method is r, where r > 1, the cloud disk client directly writes the data to be written to the data storage node according to the preset data storage method. The amount of data sent to the storage cluster in this process is BW*r. Therefore, since the amount of metadata of the write request is much smaller than the amount of data to be written, the amount of metadata of the write request can be ignored. When ignoring the amount of metadata of the write request, the network bandwidth of the storage cluster required for the write request is BW*r. Compared with the data writing method shown in Figure 1 above, that is, the data writing method that requires the network bandwidth of the storage cluster of the write request to be BW*(1+r), the data writing method provided in this embodiment reduces the network bandwidth occupation of BW, that is, the reduction ratio of write bandwidth is approximately: 1-r / (r+1).
[0095] In this embodiment of the disclosure, the number Q of the aforementioned data block replicas is an integer multiple of the number R of replicas required by the preset data storage method, specifically equal to the number P of the requested data blocks. If P = 1, then the number Q of data block replicas is equal to the number R of replicas required by the preset data storage method.
[0096] For the embodiment where P=1, the cloud disk client 30 can directly write the data to be written into the Q data block replicas in the target data storage node according to the aforementioned data storage method, based on the identifiers of the Q data block replicas, the offset within the target data block, and the address information of the target data storage node where the Q data block replicas are located.
[0097] For example, in some embodiments, the data storage method is an erasure coding storage method with erasure coding parameters of (M, N). The cloud disk client 30 can divide the data to be written into M data fragments according to the erasure coding parameters, and encode the M data fragments using an erasure coding algorithm to obtain N check blocks; then, according to the identifiers of the target number of data block replicas, the offset within the target data block, and the address information of the target data storage node, the M data fragments and the N check blocks can be written into (M+N) data block replicas of the target data storage node; each data block replica is written with one data fragment or one check block.
[0098] For example, if the data storage method is a multi-replica storage method with K replicas, then the cloud disk client 30 can write the data to be written into the K data block replicas of the target data storage node according to the identifier of the target number of data block replicas, the offset within the target data block, and the address information of the target data storage node; each data block replica is written with the data to be written.
[0099] In other embodiments, P≥2, that is, there are multiple data blocks to be requested, and the data to be written only needs to be written to R data block replicas. In order to select the R data block replicas required for the data to be written, the first storage service node 402a can sort the P data blocks to be requested according to the performance parameter information of the data block replicas corresponding to each of the P data blocks to be requested, so as to obtain the sorting result of the P data blocks to be requested.
[0100] In some embodiments, for any data block Ci, the first storage service node 402a can perform a weighted summation of the performance parameters of the data block replicas corresponding to data block Ci according to a pre-set weight, to obtain a performance score for the first storage service node 402a. Then, it can sort the P data blocks to be requested according to their performance scores from highest to lowest, to obtain a sorting result for the P data blocks to be requested. The first storage service node 402a can also send the sorting result of the P data blocks to be requested to the cloud disk client 30. Specifically, the first storage service node 402a can encapsulate the sorting result, along with the identifiers of the aforementioned Q data block replicas, the address information of the target data storage nodes where the Q data block replicas reside, the preset data storage method, and the offset within the target data block, in the same response message and return it to the cloud disk client 30.
[0101] The cloud disk client 30 can prioritize writing the data to be written to the data block replicas corresponding to the data blocks with the highest ranking based on the sorting results. Specifically, the cloud disk client 30 can determine the data block Ci located at the first position based on the sorting results, and obtain the identifiers of the R target data block replicas corresponding to data block Ci from the identifiers of the Q data block replicas, and obtain the address information of the target data storage nodes where the R target data block replicas are located from the address information of the target data storage nodes. Further, the cloud disk client 30 can write the data to be written to the R target data block replicas according to the aforementioned preset data storage method based on the identifiers of the R target data block replicas and the address information of the target data storage nodes where the R target data block replicas are located.
[0102] If the data to be written fails, the cloud disk client 30 selects R target data block replicas corresponding to the second-ranked data block Cj for data writing. For details on the writing method, please refer to the aforementioned content regarding writing data to the R target data block replicas corresponding to the first-ranked data block Ci; it will not be repeated here. If the data to be written fails, the client selects R target data block replicas corresponding to the next-ranked data block to continue writing data, until the data to be written is successfully written or all data blocks (i.e., P data blocks) fail to be written.
[0103] The above data writing process can be summarized as follows: The cloud disk client 30, based on the sorting results, determines the top-ranked target data block among the data blocks that have not yet been attempted to write data. Then, it obtains the identifiers of R target data block replicas corresponding to the target data block from the identifiers of the target number of data block replicas; and obtains the address information of the target data storage nodes where the R target data block replicas are located from the address information of the target data storage nodes. Then, based on the identifiers of the R target data block replicas and the address information of the target data storage nodes where the R target data block replicas are located, it writes the data to be written to the R target data block replicas according to the aforementioned preset data storage method. If the writing of the data to be written to the R target data block replicas fails, the operation of determining the top-ranked target data block among the data blocks that have not yet been attempted to write data is re-executed, until the data to be written is successfully written or all P data blocks fail to be written.
[0104] For example, assuming P = 3, and the three data blocks to be requested are represented as {C1, C2, C3}, and the sorting result of the three data blocks to be requested is assumed to be {C2, C1, C3}, then the cloud disk client 30 can first write the aforementioned data to be written to the data block replica corresponding to data block C2. If writing data block C2 fails, then the data to be written is written to the data block replica corresponding to data block C1; if writing data block C1 fails, then the data to be written is written to the data block replica corresponding to data block C3. If writing data block C3 fails, then it is determined that all three data blocks have failed to be written.
[0105] In this embodiment, the cloud disk client 30 prioritizes writing data to data blocks with better performance based on the performance ranking of the data blocks. This can improve the data writing speed, reduce the response latency of data writing, and increase the probability of successful data writing.
[0106] If the data to be written is successfully written, the cloud disk client 30 can send write success metadata to the first storage service node 402a. This write success metadata includes the amount of data written to each data storage node and the response latency of the write request. The first storage service node 402a updates the performance parameter information of the data block replica based on the amount of data written to each data storage node and the response latency of the write request. The updated performance parameter information of the data block replica can provide a reference for subsequent write requests in selecting the data block replica.
[0107] Specifically, the first storage service node 402a can update the free storage resource information of the data block replica in the target data storage node according to the amount of data written to the target data storage node by the write request. For example, it can subtract the amount of data written to the data block replica by the original free storage space of the data block replica to obtain the new free storage resource information. The first storage service node 402a can also update the response latency corresponding to the data block replica in the target data storage node according to the response latency of the write request.
[0108] In some embodiments, the write success metadata further includes: the starting LBA carried in the write request, the identifier of the successfully written data block, and the offset within the successfully written data block. Accordingly, the first storage service node 402a can establish a second correspondence between the starting LBA of the write request and the identifier and offset within the successfully written data block, and persistently store the second correspondence. Optionally, the first storage service node 402a can persistently store the second correspondence in the data storage system 50. For example, the first storage service node 402a can persistently store the second correspondence in the metadata storage node 502 or the data storage node 501. By persistently storing the second correspondence, when a read request arrives, the LBA carried in the read request can be used to query the persistently stored second correspondence to obtain the identifier and offset within the data block corresponding to the LBA carried in the read request. The data required by the read request can then be read based on the identifier and offset within the data block corresponding to the LBA carried in the read request. In other words, persistently storing the second correspondence provides a basis for subsequent read requests to read data.
[0109] In some embodiments, if the data to be written is successfully written or if all P data blocks fail to be written, the cloud disk client 30 can send the identifier of the failed data block replica to the first storage service node 402a. Correspondingly, the first storage service node 402a can modify the status information in the performance parameter information of the failed data block replica to a sealed state based on the identifier of the failed data block replica. This can provide a reference for selecting data block replicas for subsequent write requests. For example, when selecting data block replicas for subsequent write requests, it can avoid data block replicas in a sealed state, which helps increase the probability of successful writes for subsequent write requests. In embodiments where the data to be written is successfully written, the cloud disk client 30 can carry the identifier of the failed data block replica in the aforementioned successful write metadata. This reduces the number of network communications between the cloud disk client 30 and the storage service node, and lowers the frequency of I / O access to the storage service node.
[0110] In the embodiment where all P data blocks fail to be written, the cloud disk client 30 can also resend the metadata of the write request to the first storage service node 402a to trigger the first storage service node 402a to reselect a data block replica for the write request, that is, to re-execute the data writing process provided in the aforementioned embodiment, which can reduce the probability of data loss and ensure the high availability of the data storage service.
[0111] In some embodiments, the service system can also support concurrent write requests for data writing. That is, the aforementioned write requests are multiple concurrent write requests. The first storage service node is used to: employ a load balancing strategy to distribute multiple write requests to P data blocks. Distributing multiple concurrent write requests to selected P data blocks according to the load balancing strategy helps to achieve load balancing of the data blocks. In this embodiment, the specific implementation form of the load balancing strategy is not limited. The following describes an exemplary implementation of distributing multiple write requests to P data blocks in conjunction with a centralized load balancing strategy. The load balancing strategy may be:
[0112] (1) Round-Robin Load Balancing: Multiple write requests are allocated to different data blocks in chronological order. For example, write request 1 is allocated to data block C1, write request 2 is allocated to data block C2, and so on. If the number of write requests S is P, then each data block is sharded to one write request; if the number of write requests S is less than P, then according to the aforementioned sorting result, the top S data blocks are selected from the P data blocks, and the S write requests are allocated to the S data blocks, with each data block sharded to one write request. If the number of write requests S is greater than P, then the P write requests are first allocated to the P data blocks; and the identifiers of the P data blocks and the address information of the corresponding replicas of the P data blocks are returned to the cloud disk clients with the P write requests, so that the cloud disk clients can concurrently execute the data writing of the P write requests. For the specific data writing process, please refer to the relevant content of the aforementioned embodiments. For the remaining (SP) write requests, the first storage service node can reselect data block replicas in accordance with the method provided in the aforementioned embodiments. For details, please refer to the relevant content of the aforementioned embodiments, which will not be repeated here.
[0113] (2) Load balancing based on weights: Weights are used to specify the polling probability, and the weight of a data block is directly proportional to the access ratio. The higher the weight of a data block, the greater the probability of it being accessed. Specifically, the first storage service node can distribute multiple write requests to P data blocks according to the weights of the P data blocks.
[0114] (3) Internet Protocol (IP) hash (IP_Hash) balance method: IP_Hash refers to hashing the source IP address of each write request. Each request is allocated to a data block according to the hash result of the source IP address, so that each user's request always accesses a data block.
[0115] The load balancing method illustrated in the foregoing embodiments is merely illustrative and does not constitute a limitation. After allocating multiple write requests to P data blocks, the first storage service node can return the identifier of the data block replica corresponding to each write request, the address information of the target data storage node where these data block replicas are located, and the preset data storage method to the cloud disk client 30 that sent the write request. The cloud disk client 30 can write the data to be written corresponding to the write request into these data block replicas according to the identifier of the data block replica, the address information of the target data storage node where these data block replicas are located, and the preset data storage method. For specific data writing methods, please refer to the relevant content in the foregoing embodiments, which will not be repeated here.
[0116] In addition to reducing write bandwidth, the embodiments of this disclosure can also reduce data read bandwidth. The data read method provided by the embodiments of this disclosure will be described exemplarily below.
[0117] When a user needs to read data, a read request can be sent to the cloud disk client via the VM. This read request may include: the identifier of the cloud disk to be read (defined as the second cloud disk), the starting LBA of the data to be read, and its length. Further, as shown in step 1 of Figure 4, the cloud disk client 30 can send a read request to the storage service node 402b serving the second cloud disk (defined as the second storage service node). For a detailed implementation of how the cloud disk client 30 determines the second storage service node 402b, please refer to the aforementioned content regarding the cloud disk client 30 determining the first storage service node 402a; it will not be repeated here.
[0118] Accordingly, as shown in step 2 of Figure 4, the second storage service node 402b can respond to a read request by obtaining the address information of the data block to be read. Specifically, the second storage service node 402b can obtain the target LBA and target data length corresponding to the data to be read from the read request. The target LBA is the starting LBA of the data to be read. The second storage service node 402b can determine the identifier of the data block corresponding to the target LBA and the offset within the data block from the second correspondence between the stored starting LBA, the identifier of the successfully written data block, and the offset within the successfully written data block.
[0119] Furthermore, the second storage service node 402b can use a quadruple consisting of the target LBA, the target data length, the identifier of the data block corresponding to the target LBA, and the offset within the data block as the address information of the data block to be read. The quadruple can be represented as (LBA, len, Ci, Offset). LBA represents the starting LBA carried in the read request (i.e., the aforementioned target LBA, where len represents the target data length, Ci represents the identifier of the data block corresponding to the target LBA, and Offset represents the offset within the data block). Since the data to be read in the read request may consist of multiple data fragments, the starting LBA and length of each data fragment may be the same or different. Different data fragments may be stored in different data blocks. Accordingly, the address information of the data blocks to be read may be multiple quadruples. That is, the address information of the data blocks to be read can be represented as: a list of addresses composed of multiple quadruples, such as: [(LBA1, len1, C1, Offset1), (LBA2, len2, C2, Offset2), ..., (LBAk, lenk, Ck, Offsetk)]. Here, k represents the number of data blocks to be read.
[0120] Furthermore, the second storage service node 402b can return the address information of the data block to be read to the cloud disk client 30. The cloud disk client 30 can read data from the data storage node where the data block to be read is located based on the address information of the data block to be read. Specifically, the cloud disk client 30 can determine the data block corresponding to the target LBA from the data storage node where the data block to be read is located based on the identifier of the data block corresponding to the target LBA, and then read data of the target data length starting from the offset within the data block corresponding to the target LBA.
[0121] In this embodiment, the storage service node responds to a read request by returning the address information of the data block to be read to the cloud disk client. The cloud disk client then reads the data directly from the data storage node based on the address information of the data block. This data reading method eliminates the need for the storage service node to read data from the data storage node and then forward the data to the cloud disk client. This reduces the bandwidth required for the storage cluster to serve read requests, thereby reducing the network bandwidth consumption of the storage cluster by read requests and helping to improve the throughput of the storage cluster.
[0122] The following analysis explains the network bandwidth required for the data reading method provided in this embodiment. For a read request, assuming the amount of data the cloud disk client needs to read from the storage cluster is BR, and the cloud disk client directly reads the corresponding data from the data storage node of the persistent storage layer (i.e., the data storage system), the required network bandwidth is BR. Compared to the data reading method shown in Figure 1 above, which requires 2*BR of network bandwidth from the storage cluster for the read request, the data reading method provided in this embodiment reduces the network bandwidth usage of BR, i.e., the data reading bandwidth is reduced by approximately 50%.
[0123] Based on the data read / write process provided in the foregoing embodiments, it can be seen that the data read / write method provided in this disclosure saves network bandwidth compared to the data read / write method shown in Figure 1. However, in some cases, the advantages of the aforementioned layered storage cluster deployment (cloud disk client—storage service layer—persistent storage layer) should also be considered, meaning that in some scenarios, the data read / write method shown in Figure 1 still needs to be used for data read / write. The request processing methods for these scenarios are described in detail below.
[0124] Because cloud disk clients have limited resources on compute nodes, allowing them to directly access the persistent storage layer (i.e., directly read and write data to data storage nodes) would, in the most extreme case, require each client to establish a network connection with every data storage node. This would undoubtedly place a significant burden on the resources of both the cloud disk client and the persistent storage layer. Due to the aforementioned layered storage cluster approach, cloud disk clients only need to establish connections with a subset of storage service nodes, which then communicate with the data storage nodes, reducing the number of connections between the cloud disk client and the persistent storage layer. Therefore, to find a balance between direct read / write operations by the cloud disk client and forwarding through storage service nodes, a threshold control mechanism can be introduced.
[0125] Specifically, a threshold control can be applied to the number of connections between the cloud disk client and the persistent storage layer. When the number of connections between the cloud disk client and the data storage node is less than the threshold, a connection can be directly established with the data storage node for reading and writing data. If the number of connections between the cloud disk client and the data storage node is greater than or equal to the set threshold, the read / write request needs to be forwarded to the storage service node for data reading and writing. Specifically, the cloud disk client 30 can also obtain the current number of connections with the data storage node 501. If the current number of connections between the cloud disk client 30 and multiple data storage nodes 501 is less than the set threshold, the operation of sending the metadata of the write request to the first storage service node 402a in response to the write request is performed, i.e., the write request is processed using the method provided in the aforementioned embodiment where the cloud disk client 30 directly writes data to the data storage node. If the current number of connections between the cloud disk client 30 and multiple data storage nodes 501 is greater than or equal to the set threshold, the write request is sent to the first storage service node 402a. In response to the write request, the first storage service node 402a can write the data to be written to the target data storage node according to a preset data storage method, based on the identifier of the target number of data block replicas and the address information of the target data storage node.
[0126] Similarly, for a read request, if the number of connections between the cloud disk client and multiple data storage nodes is less than a set threshold, the cloud disk client 30 can send a read request carrying a first flag to the second storage service node 402b, instructing the second storage service node to return the address information of the data block to be read. The second storage service node 402b can determine, based on the first flag, that the data reading method is for the cloud disk client to directly read data from the data storage node. Accordingly, the second storage service node 402b can determine the address information of the data block to be read and return it to the cloud disk client. For details, please refer to the relevant content in the foregoing embodiments.
[0127] Accordingly, if the number of connections between the cloud disk client and multiple data storage nodes is greater than or equal to a set threshold, the cloud disk client 30 can send a read request carrying a second flag to the second storage service node 402b, instructing the second storage service node 402b to read data from the data storage node where the data block to be read is located, based on the address information of the data block to be read. The second storage service node 402b can determine the data reading method to be performed by the second storage service node based on the second flag. Accordingly, the second storage service node 402b can determine the address information of the data block to be read in the read request and read the data from the data storage node where the data block to be read is located, based on the address information of the data block to be read. Afterwards, the second storage service node 402b returns the read data to the cloud disk client 30.
[0128] In this embodiment, by introducing a threshold control mechanism, this approach enables autonomous switching between modes where the cloud disk client directly reads and writes to data storage nodes and where the storage service node reads and writes to data storage nodes. This retains the advantages and simplicity of tiered storage and offers good compatibility with the logic of traditional storage service node data reading and writing. It preserves the performance advantages of connections between the cloud disk client and data storage nodes (i.e., reducing network bandwidth consumption on the storage cluster along the data path) while avoiding excessive resource consumption caused by too many connections between the cloud disk client and data storage nodes. This is an effective resource management and performance optimization strategy.
[0129] In other embodiments, since the storage service layer (i.e., the storage service system) can provide functions such as snapshots, replication, migration, multi-mounting, and disaster recovery, placing these functions in the cloud disk client would significantly increase the complexity of its deployment and operation. Therefore, if the read and write requests in these data processing operations are to be performed by the storage service node, the complexity of the deployment and operation of these data processing operations can be reduced. Therefore, the cloud disk client 30 can pre-set some data processing operations. If the read and write request is a read or write request that implements one of these pre-set data processing operations, then the storage service node will perform the data read and write. These pre-set data processing operations can include snapshot creation, data replication, data migration, cloud disk multi-mounting, and data disaster recovery, etc.
[0130] For write requests, the cloud disk client 30 can monitor the data processing operation implemented by the write request. If the detected write request is not a write request implementing the set data processing operation, it will respond to the write request by sending the metadata of the write request to the first storage service node 402a. That is, the write request will be processed by the cloud disk client 30 directly writing data to the data storage node in the aforementioned embodiment. If the write request from the cloud disk client 30 is a write request implementing the set data operation, it will send the write request to the first storage service node 402a. In response to the write request, the first storage service node 402a can write the data to be written to the target data storage node according to the identifier of the target number of data block replicas and the address information of the target data storage node, in accordance with the preset data storage method.
[0131] Similarly, for a read request, if the read request is not a read request in the set data processing operation, the cloud disk client 30 can send a read request carrying a first flag to the second storage service node 402b, instructing the second storage service node to return the address information of the data block to be read. The second storage service node 402b can determine, based on the first flag, that the data reading method used is for the cloud disk client to directly read data from the data storage node. Accordingly, the second storage service node 402b can determine the address information of the data block to be read in the read request and return the address information of the data block to be read to the cloud disk client. For details, please refer to the relevant content of the foregoing embodiments.
[0132] Accordingly, if the read request is a read request in a set data processing operation, the cloud disk client 30 can send the read request carrying the second flag to the second storage service node 402b, instructing the second storage service node 402b to read data from the data storage node where the data block to be read is located, based on the address information of the data block to be read. The second storage service node 402b can determine the data reading method to be performed by the second storage service node based on the second flag. Accordingly, the second storage service node 402b can determine the address information of the data block to be read in the read request, and read the data from the data storage node where the data block to be read is located based on the address information of the data block to be read. Afterwards, the second storage service node 402b returns the read data to the cloud disk client 30.
[0133] This approach allows cloud disk clients to automatically switch between reading and writing to data storage nodes and reading and writing to data storage nodes, retaining the advantages and simplicity of tiered storage and offering good compatibility with the traditional data reading and writing logic of storage service nodes. When implementing the set data processing operations in read / write requests, using data storage nodes to read data storage nodes facilitates the implementation of these operations, reducing the complexity and operational costs of implementing and deploying them.
[0134] The data read / write method provided in the aforementioned embodiments records the load and response status of all read / write requests through the storage service node. When a new read / write request arrives, a lower-loaded data storage node can be selected for data read / write, achieving load balancing. This mechanism works effectively in most cases, but in extreme cases, the data storage node may experience excessive load or too many connections. To address this issue, in some embodiments, for write requests, the data storage node 501 may not respond to write requests sent by the cloud disk client 30 or may perform packet loss processing if its load is greater than or equal to a set load threshold, or if the number of connections between the data storage node 501 and other nodes exceeds a set target threshold. This alleviates the load pressure on the data storage node and improves the stability of the storage cluster. Correspondingly, if the cloud disk client 30 does not receive a response to a write request and / or a response to a read request within a set time period, it can resend the metadata of the write request to the first storage service node 402a to request the first storage service node 402a to reselect a data block replica.
[0135] In addition to the service system provided in the foregoing embodiments, this disclosure also provides data writing and data reading methods. The data writing and data reading methods provided in this disclosure are described below from the perspectives of cloud disk clients and storage service nodes, respectively.
[0136] Figure 5 is a flowchart illustrating the data writing method provided in this embodiment. This method is primarily applicable to cloud disk clients. As shown in Figure 5, the data writing method mainly includes:
[0137] 51. In response to a write request, the metadata of the write request is sent to the first storage service node among multiple storage service nodes. The metadata of the write request is used to trigger the first storage service node to select a target number of data block replicas for the write request, and to return the address information of the target data storage node where the target number of data block replicas are located, as well as the identifier of the target number of data block replicas. The first storage service node is the storage service node of the first cloud disk to be written to by the write request.
[0138] 52. Based on the identifiers of the target number of data block replicas and the address information of the target data storage node, write the data to be written corresponding to the write request to the target data storage node.
[0139] Figure 6 is a flowchart illustrating another data writing method provided in this embodiment of the present disclosure. This method is mainly applicable to storage service nodes. As shown in Figure 6, the data writing method mainly includes:
[0140] 601. Obtain the metadata of write requests sent by cloud disk clients in the computing cluster.
[0141] 602. In response to the metadata of the write request, select the target number of data block replicas from the first cloud disk for the write request based on the number of replicas required by the preset data storage method.
[0142] 603. Request the target number of data block copies from the metadata storage node so that the metadata storage node can return the address information of the target data storage node where the target number of data block copies are located.
[0143] 604. Send the identifiers of the target number of data block replicas and the address information of the target data storage node to the cloud disk client, so that the cloud disk client can write the data to be written corresponding to the write request to the target data storage node according to the identifiers of the target number of data block replicas and the address information of the target data storage node.
[0144] In the method embodiments provided in this disclosure, the roles of nodes such as cloud disk clients, storage service nodes, and data storage nodes, as well as the system architecture of the service system they belong to, can be found in the relevant content of the aforementioned system embodiments, and will not be repeated here.
[0145] To reduce network bandwidth consumption during data read / write operations, for the cloud disk client, in step 51, in response to a write request, the metadata of the write request can be sent to the first storage service node among multiple storage service nodes. The metadata of the write request may include: the identifier of the first cloud disk carried in the write request, the starting logical block address (LBA) to be written, and the length of the data to be written. The first storage service node refers to the storage service node among the multiple storage service nodes that serves the cloud disk to be written to in the write request. For ease of description, the cloud disk to be written to in the write request is defined as the first cloud disk. The implementation method for the cloud disk client to determine the address information of the first storage service node serving the first cloud disk and to send the metadata of the write request to the first storage service node can be found in the relevant content of the foregoing embodiments, and will not be repeated here.
[0146] Accordingly, for the first storage service node, in step 601, the metadata of the write request can be obtained. Since the data received by the first storage service node is the metadata of the write request, and not the data to be written, the first storage service node will not directly write data to the data storage node. Instead, in step 602, in response to the metadata of the write request, a target number of data block replicas are selected from the first cloud disk for the write request. In some embodiments, the target number of data block replicas can be selected for the write request according to the number R required by the preset data storage method, which can ensure that the number of selected data block replicas meets the requirements of the data storage method and improve the probability of successful data writing.
[0147] In some embodiments, the number P of data chunks to be requested can be determined in response to the metadata of the write request. In some embodiments, if a default number of data chunks to be written per instance is preset, then the preset number of data chunks to be written per instance is used as the number P of data chunks to be requested.
[0148] In other embodiments, the identifier of the first cloud disk can be obtained from the metadata of the write request. Then, based on the identifier, the load information of the first cloud disk can be obtained, and based on the load information, the number of data blocks that the first cloud disk can support for concurrent writes can be determined as the number P of data blocks to be requested. Determining the number P of data blocks to be requested based on the cloud disk's load information allows the network bandwidth used for writing data blocks to be matched with the cloud disk's request processing capacity. When the cloud disk load is high, the number P of data blocks to be requested can be reduced, thereby reducing the amount of data written in a single operation and alleviating the pressure on the cloud disk; when the cloud disk load is low, the number P of data blocks to be requested can be increased, improving the cloud disk's IOPS and throughput.
[0149] In some embodiments, a pre-defined correspondence between the load of a cloud disk and the number of concurrently written data blocks can be established. Based on this correspondence, the load information of the first cloud disk can be queried to obtain the number of concurrently written data blocks that the first cloud disk can support, which is then used as the number P of data blocks to be requested.
[0150] Furthermore, the target number Q of data block replicas can be determined based on the number R required for the preset data storage method and the number P of data blocks to be requested. Specifically, the target number Q of data block replicas can be the product of the number R required for the preset data storage method and the number P of data blocks to be requested. That is, Q = R * P. Here, one data block corresponds to R data block replicas.
[0151] Furthermore, based on the performance parameters of the data blocks contained in the first cloud disk, a target number of data block replicas (i.e., Q data block replicas) can be selected from the first cloud disk for the write request. Specifically, based on the performance parameters of the data blocks contained in the first cloud disk, data blocks in an unsealed state can be obtained from the first cloud disk, and data blocks with free storage space greater than or equal to the length of the data to be written can be obtained from the unsealed data blocks. Then, in ascending order of response latency, Q data blocks with free storage space greater than or equal to the length of the data to be written can be selected as Q data block replicas. Selecting the target number of data block replicas from the first cloud disk based on the data block performance parameters allows for the selection of data blocks with better performance for data writing, which helps improve the data writing speed and reduce the response latency of the write request.
[0152] After selecting Q data block replicas, in step 603, these Q data block replicas can be requested from the metadata storage node, that is, a request can be made to allocate the Q data block replicas to the aforementioned write request. In response to the request from the first storage service node, the metadata storage node can return the address information of the target data storage node where the Q data block replicas are located to the first storage service node.
[0153] Specifically, the identifiers of Q data block replicas can be sent to the metadata storage node. The identifier of a data block replica is the identifier of the corresponding data block in the cloud disk. The metadata storage node can obtain the address information of the target data storage node corresponding to each of the Q data block replicas from the pre-stored mapping between data block identifiers and data storage node address information, and return the address information of the target data storage nodes containing the Q data block replicas to the first storage service node.
[0154] Furthermore, in step 604, the identifiers of the Q data block replicas and the address information of the target data storage node where the Q data block replicas are located can be sent to the cloud disk client. Thus, in step 52, the cloud disk client can write the data to be written corresponding to the write request to the target data storage node based on the identifiers of the Q data block replicas and the address information of the target data storage node where the Q data block replicas are located.
[0155] In some embodiments, the first storage service node may also send a preset data storage method to the cloud disk client. Accordingly, the cloud disk client may, based on the identifiers of the Q data block replicas and the address information of the target data storage node where the Q data block replicas reside, write the data to be written corresponding to the write request to the target data storage node according to the preset data storage method. The first storage service node may encapsulate the identifiers of the aforementioned Q data block replicas, the address information of the target data storage node where the Q data block replicas reside, and the preset data storage method in the same response message and send it to the cloud disk client.
[0156] In other embodiments, the cloud disk client is pre-configured with a data storage method. This data storage method is the same as the data storage method pre-configured in the first storage service node. Accordingly, the cloud disk client can write the data to be written corresponding to the write request to the target data storage node according to the identifiers of the Q data block replicas and the address information of the target data storage node where the Q data block replicas are located, in accordance with the pre-configured data storage method.
[0157] Specifically, the first storage service node can obtain the starting LBA to be written from the metadata of the write request; and determine the target data block offset corresponding to the starting LBA according to the first correspondence between the pre-stored LBA and the data block offset. Specifically, the first storage service node can query the first correspondence using the starting LBA and use the data block offset corresponding to the starting LBA in the first correspondence as the target data block offset. Further, the first storage service node can send the target data block offset to the cloud disk client. Specifically, the first storage service node can encapsulate the identifiers of the aforementioned Q data block replicas, the address information of the target data storage node where the Q data block replicas are located, and the preset data storage method, along with the target data block offset, in the same response message and send it to the cloud disk client.
[0158] Accordingly, the cloud disk client can write the data to be written to the target data storage node according to the aforementioned data storage method, based on the identifiers of the Q data block replicas, the offset within the target data block, and the address information of the target data storage node where the Q data block replicas are located. Specifically, the cloud disk client can determine the data block replicas from the target data storage node based on the identifiers of the Q data block replicas and the address information of the target data storage node where the Q data block replicas are located; and write the data to be written to the data block replica starting from the offset within the target data block of the data block replica.
[0159] In this embodiment, the cloud disk client no longer sends write requests to the storage service node; instead, it sends the metadata of the write request to the storage service node. The storage service node can respond to the metadata by selecting a target number of data block replicas for the write request and obtain the address information of the data block replicas from the metadata storage node. Then, the storage service node returns the address information of the data block replicas to the cloud disk client. In this way, the cloud disk client can directly write the data to be written to the data storage node based on the address information of the data block replicas. This write method eliminates the need for the storage service node to forward the data to be written, reducing the bandwidth amplification required for the storage cluster to serve write requests, thereby reducing the network bandwidth consumption of the storage cluster by write requests and helping to improve the throughput of the storage cluster.
[0160] In this embodiment of the disclosure, the number Q of the aforementioned data block replicas is an integer multiple of the number R of replicas required by the preset data storage method, specifically equal to the number P of the requested data blocks. If P = 1, then the number Q of data block replicas is equal to the number R of replicas required by the preset data storage method.
[0161] For the embodiment where P=1, the cloud disk client can directly write the data to be written into the Q data block replicas in the target data storage node according to the aforementioned data storage method, based on the identifiers of the Q data block replicas, the offset within the target data block, and the address information of the target data storage node where the Q data block replicas are located.
[0162] In other embodiments, P ≥ 2, meaning there are multiple data blocks to be requested, while the data to be written only needs to be written to R data block replicas. To select the R data block replicas required for the data to be written, the first storage service node can sort the P data blocks to be requested based on their performance parameters, obtaining a sorting result for the P data blocks. The first storage service node can also send the sorting result of the P data blocks to the cloud disk client. Specifically, the first storage service node can encapsulate the sorting result, along with the identifiers of the aforementioned Q data block replicas, the address information of the target data storage nodes where the Q data block replicas reside, the preset data storage method, and the offset within the target data block, in the same response message and return it to the cloud disk client.
[0163] The cloud disk client can prioritize writing data to the data block replicas corresponding to the highest-ranking data blocks based on the sorting results. Specifically, each time, the cloud disk client selects the highest-ranking target data block from the data blocks that have not yet been attempted to write data, based on the sorting results. Then, it retrieves the identifiers of R target data block replicas corresponding to the target data block from the identifiers of the target number of data block replicas; and retrieves the address information of the target data storage nodes where the R target data block replicas reside from the address information of the target data storage nodes. Then, based on the identifiers of the R target data block replicas and the address information of the target data storage nodes where the R target data block replicas reside, it writes the data to be written to the R target data block replicas according to the aforementioned preset data storage method. If the writing of the data to be written to the R target data block replicas fails, the operation of selecting the highest-ranking target data block from the data blocks that have not yet been attempted to write data is re-executed, until the data to be written is successfully written or all P data blocks fail to be written.
[0164] In this embodiment, the cloud disk client prioritizes writing data to data blocks with better performance based on the performance ranking of the data blocks. This can improve the data writing speed, reduce the response latency of data writing, and increase the probability of successful data writing.
[0165] If the data to be written is successfully written, the cloud disk client can send write success metadata to the first storage service node. This write success metadata includes the amount of data written to each data storage node and the response latency of the write request. The first storage service node updates the performance parameters of the data block replica based on the amount of data written to each data storage node and the response latency. The updated performance parameters of the data block replica can provide a reference for subsequent write requests in selecting the appropriate data block replica.
[0166] Specifically, the first storage service node can update the free storage resource information of the data block replicas in the target data storage node based on the amount of data written to the target data storage node by the write request. For example, it can subtract the amount of data written to the data block replica by the original free storage space of the data block replica to obtain the new free storage resource information. The first storage service node can also update the response latency of the data block replicas in the target data storage node based on the response latency of the write request.
[0167] In some embodiments, the write success metadata further includes: the starting LBA carried in the write request, the identifier of the successfully written data block, and the offset within the successfully written data block. Accordingly, the first storage service node can establish a second correspondence between the starting LBA of the write request and the identifier and offset within the successfully written data block, and persistently store the second correspondence. By persistently storing the second correspondence, when a read request arrives, the LBA carried in the read request can be used to query the persistently stored second correspondence to obtain the identifier and offset within the data block corresponding to the LBA carried in the read request. Based on the identifier and offset within the data block corresponding to the LBA carried in the read request, the data required by the read request can be read. In other words, persistently storing the second correspondence provides a basis for subsequent read requests to read data.
[0168] In some embodiments, if the data to be written is successfully written or if all P data blocks fail to be written, the cloud disk client can send the identifier of the failed data block replica to the first storage service node. Correspondingly, the first storage service node can modify the status information in the performance parameter information of the failed data block replica to a sealed state based on the identifier of the failed data block replica. This provides a reference for selecting data block replicas for subsequent write requests; for example, it can avoid selecting data block replicas in a sealed state when selecting data block replicas for subsequent write requests, thus helping to increase the probability of successful writes for subsequent write requests. In embodiments where the data to be written is successfully written, the cloud disk client can carry the identifier of the failed data block replica in the aforementioned successful write metadata. This reduces the number of network communications between the cloud disk client and the storage service node, and lowers the frequency of I / O access to the storage service node.
[0169] In the embodiment where all P data blocks fail to be written, the cloud disk client can also resend the metadata of the write request to the first storage service node to trigger the first storage service node to reselect a data block replica for the write request, that is, to re-execute the data writing process provided in the aforementioned embodiment, which can reduce the probability of data loss and ensure the high availability of the data storage service.
[0170] In some embodiments, the service system can also support concurrent write requests. That is, the aforementioned write requests are multiple concurrent write requests. The first storage service node is used to: employ a load balancing strategy to distribute the multiple write requests to P data blocks. Distributing the concurrent multiple write requests to the selected P data blocks according to the load balancing strategy helps to achieve load balancing of the data blocks.
[0171] After allocating multiple write requests to P data blocks, the first storage service node can return the identifier of the data block replica corresponding to each write request, the address information of the target data storage node where these data block replicas are located, and the preset data storage method to the cloud disk client that sent the write request. The cloud disk client can then write the data to be written corresponding to the write request into these data block replicas based on the identifier of the data block replica, the address information of the target data storage node where these data block replicas are located, and the preset data storage method. For specific data writing methods, please refer to the relevant content in the foregoing embodiments, which will not be repeated here.
[0172] In addition to reducing write bandwidth, the embodiments of this disclosure can also reduce data read bandwidth. The data read method provided by the embodiments of this disclosure will be described by way of example below.
[0173] Figure 7 is a flowchart illustrating the data reading method provided in this embodiment. This method is primarily applicable to cloud storage clients. As shown in Figure 7, the data reading method mainly includes:
[0174] 701. Send a read request to the second storage service node among multiple storage service nodes; the second storage service node is the storage service node of the second cloud disk to be read by the read request.
[0175] 702. Obtain the address information of the data block to be read in the read request returned by the second storage service node.
[0176] 703. Read data from the data storage node where the data block to be read is located, based on the address information of the data block to be read.
[0177] Figure 8 is a flowchart illustrating another data reading method provided in this embodiment. This method is mainly applicable to storage service nodes. As shown in Figure 8, the data reading method mainly includes:
[0178] 801. Obtain the read request sent by the cloud disk client in the computing cluster.
[0179] 802. In response to a read request, obtain the address information of the data block to be read; the data block to be read is located on the data storage node.
[0180] 803. Return the address information of the data block to be read to the cloud disk client so that the cloud disk client can read the data from the data storage node where the data block to be read is located based on the address information of the data block to be read.
[0181] When a user needs to read data, a read request can be sent to the cloud disk client via the VM. This read request may include: the identifier of the cloud disk to be read (defined as the second cloud disk), the starting LBA of the data to be read, and its length. Further, for the cloud disk client, in step 701, a read request can be sent to the storage service node serving the second cloud disk (defined as the second storage service node). For a detailed implementation of how the cloud disk client determines the second storage service node, please refer to the aforementioned content regarding the cloud disk client determining the first storage service node; it will not be repeated here.
[0182] Accordingly, in step 801, the second storage service node can obtain a read request, and in step 802, in response to the read request, obtain the address information of the data block to be read. Specifically, the target LBA and target data length corresponding to the data to be read can be obtained from the read request. The target LBA is the starting LBA of the data to be read. The second storage service node can determine the identifier of the data block corresponding to the target LBA and the offset within the data block from the second correspondence between the stored starting LBA and the identifier of the successfully written data block and the offset within the successfully written data block.
[0183] Furthermore, a quadruple consisting of the target LBA, target data length, the identifier of the data block corresponding to the target LBA, and the offset within the data block can be used as the address information of the data block to be read. The quadruple can be represented as (LBA, len, Ci, Offset). LBA represents the starting LBA carried by the read request (i.e., the aforementioned target LBA), len represents the target data length, Ci represents the identifier of the data block corresponding to the target LBA, and Offset represents the offset within the data block. Since the data to be read in the read request may be multiple data fragments, the starting LBA and length of each data fragment may be the same or different. Different data fragments may be stored in different data blocks. Accordingly, the address information of the data block to be read may consist of multiple quadruples.
[0184] Furthermore, in step 803, the address information of the data block to be read can be returned to the cloud disk client. In step 702, the cloud disk client can obtain the address information of the data block to be read, and in step 703, according to the address information of the data block to be read, read the data from the data storage node where the data block to be read is located. Specifically, the data block corresponding to the target LBA can be determined from the data storage node where the data block to be read is located based on the identifier of the data block corresponding to the target LBA. Then, data of the target data length can be read starting from the offset within the data block corresponding to the target LBA.
[0185] In this embodiment, the storage service node responds to a read request by returning the address information of the data block to be read to the cloud disk client. The cloud disk client then reads the data directly from the data storage node based on the address information of the data block. This data reading method eliminates the need for the storage service node to read data from the data storage node and then forward the data to the cloud disk client. This reduces the bandwidth required for the storage cluster to serve read requests, thereby reducing the network bandwidth consumption of the storage cluster by read requests and helping to improve the throughput of the storage cluster.
[0186] Based on the data read / write process provided in the foregoing embodiments, it can be seen that the data read / write method provided in this disclosure saves network bandwidth compared to the data read / write method shown in Figure 1. However, in some cases, the advantages of the aforementioned layered storage cluster deployment (cloud disk client—storage service layer—persistent storage layer) should also be considered, meaning that in some scenarios, the data read / write method shown in Figure 1 still needs to be used for data read / write. The request processing methods for these scenarios are described in detail below.
[0187] In some embodiments, the cloud disk client can also obtain the current number of connections with data storage nodes. If the current number of connections between the cloud disk client and multiple data storage nodes is less than a set threshold, then in response to the write request, the client sends the metadata of the write request to the first storage service node. That is, the write request is processed using the method provided in the foregoing embodiments where the cloud disk client directly writes data to the data storage node. If the current number of connections between the cloud disk client and multiple data storage nodes is greater than or equal to the set threshold, then the write request is sent to the first storage service node. In response to the write request, the first storage service node can write the data to be written to the target data storage node according to a preset data storage method, based on the identifiers of the target number of data block replicas and the address information of the target data storage node.
[0188] Similarly, for a read request, if the number of connections between the cloud disk client and multiple data storage nodes is less than a set threshold, the cloud disk client can send a read request carrying a first flag to the second storage service node, instructing the second storage service node to return the address information of the data block to be read. The second storage service node can determine, based on the first flag, that the data reading method is for the cloud disk client to directly read data from the data storage node. Accordingly, the second storage service node can determine the address information of the data block to be read and return this address information to the cloud disk client. For details, please refer to the relevant content in the foregoing embodiments.
[0189] Accordingly, if the number of connections between the cloud disk client and multiple data storage nodes is greater than or equal to a set threshold, the cloud disk client can send a read request carrying a second flag to the second storage service node. This instructs the second storage service node to read data from the data storage node containing the data block to be read, based on the address information of the data block. The second storage service node can then determine the data reading method to be performed by itself based on the second flag. Accordingly, the second storage service node can determine the address information of the data block to be read and read the data from the data storage node containing the data block. Afterward, the second storage service node returns the read data to the cloud disk client.
[0190] In this embodiment, by introducing a threshold control mechanism, this approach enables autonomous switching between modes where the cloud disk client directly reads and writes to data storage nodes and where the storage service node reads and writes to data storage nodes. This retains the advantages and simplicity of tiered storage and offers good compatibility with the logic of traditional storage service node data reading and writing. It preserves the performance advantages of connections between the cloud disk client and data storage nodes (i.e., reducing network bandwidth consumption on the storage cluster along the data path) while avoiding excessive resource consumption caused by too many connections between the cloud disk client and data storage nodes. This is an effective resource management and performance optimization strategy.
[0191] In other embodiments, for write requests, the cloud disk client can monitor the data processing operation implemented by the write request. If the write request is not a write request implementing a set data processing operation, the client responds by sending the metadata of the write request to the first storage service node. This is equivalent to processing the write request using the method provided in the aforementioned embodiments, where the cloud disk client directly writes data to the data storage node. If the cloud disk client detects that the write request is a write request implementing a set data operation, it sends the write request to the first storage service node. The first storage service node, in response to the write request, can write the data to be written to the target data storage node according to a preset data storage method, based on the identifiers of the target number of data block replicas and the address information of the target data storage node.
[0192] Similarly, for a read request, if the read request is not part of a pre-defined data processing operation, the cloud disk client can send a read request carrying a first flag to the second storage service node, instructing the second storage service node to return the address information of the data block to be read. The second storage service node can determine, based on the first flag, that the data reading method is for the cloud disk client to directly read data from the data storage node. Accordingly, the second storage service node can determine the address information of the data block to be read and return this address information to the cloud disk client. For details, please refer to the relevant content in the foregoing embodiments.
[0193] Accordingly, if the read request is a read request in a set data processing operation, the cloud disk client 30 can send the read request carrying the second flag to the second storage service node, instructing the second storage service node to read data from the data storage node where the data block to be read is located, based on the address information of the data block to be read. The second storage service node can determine the data reading method to be performed by the second storage service node based on the second flag. Accordingly, the second storage service node can determine the address information of the data block to be read in the read request and read the data from the data storage node where the data block to be read is located based on the address information of the data block to be read. Afterwards, the second storage service node returns the read data to the cloud disk client.
[0194] This approach allows cloud disk clients to automatically switch between reading and writing to data storage nodes and reading and writing to data storage nodes, retaining the advantages and simplicity of tiered storage and offering good compatibility with the traditional data reading and writing logic of storage service nodes. When implementing the set data processing operations in read / write requests, using data storage nodes to read data storage nodes facilitates the implementation of these operations, reducing the complexity and operational costs of implementing and deploying them.
[0195] The data read / write method provided in the aforementioned embodiments records the load and response status of all read / write requests by the storage service node. When a new read / write request arrives, a lower-loaded data storage node can be selected for data read / write, achieving load balancing. This mechanism works effectively in most cases, but in extreme cases, the data storage node may experience excessive load or too many connections. To address this issue, in some embodiments, for write requests, if the data storage node's load is greater than or equal to a set load threshold, or if the number of connections between the data storage node and other nodes exceeds a set target threshold, it may not respond to write requests sent by the cloud disk client or may perform packet loss processing on write requests. This alleviates the load pressure on the data storage node and improves the stability of the storage cluster. Correspondingly, if the cloud disk client does not receive a response to a write request and / or a response to a read request within a set time period, it can resend the metadata of the write request to the first storage service node to request the first storage service node to reselect a data block replica.
[0196] It should be noted that the execution subject of each step of the method provided in the above embodiments can be the same device, or the method can be executed by different devices. For example, the execution subject of steps 51 and 52 can be device A; or the execution subject of step 51 can be device A, and the execution subject of step 52 can be device B; and so on.
[0197] Furthermore, some processes described in the above embodiments and accompanying drawings include multiple operations that appear in a specific order. However, it should be clearly understood that these operations may not be executed in the order they appear herein, or they may be executed in parallel. The operation numbers, such as 51, 52, etc., are merely used to distinguish different operations and do not represent any execution order. In addition, these processes may include more or fewer operations, and these operations may be executed sequentially or in parallel.
[0198] Accordingly, embodiments of this disclosure also provide a computer-readable storage medium storing computer instructions, which, when executed by one or more processors, cause one or more processors to perform the steps in the aforementioned data writing method and / or data reading method.
[0199] This disclosure also includes a computer program product comprising a computer program that, when executed by one or more processors, causes the one or more processors to perform the steps in the aforementioned data writing method and / or data reading method.
[0200] In this disclosure, the specific implementation form of the computer program product is not limited. In some embodiments, the computer program product may be implemented as an application (APP), a mini-program, a computer-side client, a program module, a plug-in, an installation package, a software development kit (SDK), an image file of an optical disc (such as an ISO file), a plug-in, or software in the form of Software as a Service (SaaS), etc., but is not limited thereto.
[0201] Figure 9 is a schematic diagram of the structure of an electronic device provided in an embodiment of this disclosure. As shown in Figure 9, the electronic device includes: a memory 90a, a processor 90b, and a communication component 90c. The memory 90a is used to store computer programs.
[0202] The processor 90b is coupled to the memory 90a and the communication component 90c, and is used to execute computer programs to perform the steps in the data writing and / or data reading methods provided in the foregoing embodiments. Specific implementation details of each step can be found in the relevant descriptions of the foregoing embodiments, and will not be repeated here.
[0203] In some alternative embodiments, as shown in FIG9, the electronic device may further include optional components such as a power supply component 90d, a display component 90e, and an audio component 90f. FIG9 only schematically shows some components and does not mean that the electronic device must include all the components shown in FIG9, nor does it mean that the electronic device can only include the components shown in FIG9.
[0204] Furthermore, the components within the dashed boxes in Figure 9 are optional, not mandatory, and their specific requirements depend on the product form of the electronic device. The electronic device in this embodiment can be a desktop computer, laptop computer, mobile phone, or IoT device; it can also be a traditional server, cloud server, or server cluster, or other server equipment.
[0205] In embodiments of this disclosure, the memory is used to store computer programs and can be configured to store various other data to support operation on its host device. The processor can execute the computer programs stored in the memory to implement corresponding control logic. The memory can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random-Access Memory (SRAM), Electrically Erasable Programmable Read Only Memory (EEPROM), Electrically Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0206] In this embodiment of the disclosure, the processor can be any hardware processing device capable of executing the above-described method logic. Optionally, the processor can be a central processing unit (CPU), a graphics processing unit (GPU), or a microcontroller unit (MCU); it can also be a programmable device such as a field-programmable gate array (FPGA), a programmable array logic (PAL), a general array logic (GAL), or a complex programmable logic device (CPLD); or it can be an advanced RISC machine (ARM) or a system on chip (SoC), etc., but is not limited thereto.
[0207] In embodiments of this disclosure, the communication component is configured to facilitate wired or wireless communication between its host device and other devices. The device housing the communication component can access wireless networks based on communication standards, such as Wireless Fidelity (WiFi), 2G or 3G, 4G, 5G, or combinations thereof. In one exemplary embodiment, the communication component receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In another exemplary embodiment, the communication component may also be implemented based on Near Field Communication (NFC), Radio Frequency Identification (RFID), Infrared Data Association (IrDA), Ultra Wide Band (UWB), Bluetooth (BT), or other technologies.
[0208] In embodiments of this disclosure, the display component may include a liquid crystal display (LCD) and a touch panel (TP). If the display component includes a touch panel, the display component may be implemented as a touchscreen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensors may sense not only the boundaries of touch or swipe actions but also the duration and pressure associated with the touch or swipe operation.
[0209] In embodiments of this disclosure, a power supply component is configured to provide power to various components of the device in which it resides. The power supply component may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device in which the power supply component resides.
[0210] In embodiments of this disclosure, the audio component can be configured to output and / or input audio signals. For example, the audio component includes a microphone (MIC) configured to receive external audio signals when the device containing the audio component is in an operating mode, such as call mode, recording mode, or voice recognition mode. The received audio signals can be further stored in memory or transmitted via a communication component. In some embodiments, the audio component also includes a speaker for outputting audio signals. For example, in devices with voice interaction capabilities, voice interaction with a user can be achieved through the audio component.
[0211] It should be noted that the terms "first" and "second" in this article are used to distinguish different messages, devices, modules, etc., and do not represent a chronological order, nor do they limit "first" and "second" to different types.
[0212] Those skilled in the art will understand that embodiments of this disclosure can be provided as methods, systems, or computer program products. Therefore, this disclosure can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this disclosure can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, compact disc read-only memory (CD-ROM), optical storage, etc.) containing computer-usable program code.
[0213] This disclosure is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in one or more flowchart illustrations and / or one or more block diagrams.
[0214] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means that implement the functions specified in one or more flowcharts and / or one or more block diagrams.
[0215] These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer-implemented process, such that the instructions, which execute on the computer or other programmable apparatus, provide steps for implementing the functions specified in one or more flowcharts and / or one or more block diagrams.
[0216] In a typical configuration, a computing device includes one or more processors (CPU, etc.), input / output interfaces, network interfaces, and memory.
[0217] Memory may include non-persistent storage in computer-readable media, such as random-access memory (RAM) and / or non-volatile memory, such as read-only memory (ROM) or flash RAM. Memory is an example of computer-readable media.
[0218] Computer storage media are readable storage media, also known as removable media. Removable and non-removable media can be used to store information by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, CD-ROM, Digital Video Disc (DVD) or other optical storage, magnetic tape, disk storage or other magnetic storage devices, or any other non-transferable medium that can be used to store information accessible by a computing device. As defined herein, computer-readable media does not include transient media, such as modulated data signals and carrier waves.
[0219] It should also be noted that the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes the aforementioned element.
[0220] The above description is merely an embodiment of this disclosure and is not intended to limit the scope of this disclosure. Various modifications and variations can be made to this disclosure by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of this disclosure should be included within the scope of the claims of this disclosure.
Claims
1. A service system, wherein, include: Compute clusters and storage clusters; The computing cluster is equipped with a cloud disk client; The storage cluster includes: a storage service system and a data storage system; the storage service system includes: multiple storage service nodes; the data storage system includes: multiple data storage nodes and metadata storage nodes; The cloud disk client is used to respond to a write request by sending the metadata of the write request to the first storage service node among the plurality of storage service nodes; the first storage service node is the storage service node that serves the first cloud disk to be written by the write request. The first storage service node is configured to, in response to the metadata of the write request, select a target number of data block replicas from the first cloud disk for the write request; and request the target number of data block replicas from the metadata storage node; The metadata storage node is configured to respond to a request from the first storage service node by returning the address information of the target data storage node where the target number of data block replicas are located to the first storage service node. The first storage service node is also configured to send the identifiers of the target number of data block replicas and the address information of the target data storage node to the cloud disk client; The cloud disk client is used to write the data to be written corresponding to the write request to the target data storage node according to the identifier of the target number of data block replicas and the address information of the target data storage node.
2. The system according to claim 1, wherein, The cloud disk client is also used to: send a read request to the second storage service node among the plurality of storage service nodes; the second storage service node is the storage service node that serves the second cloud disk to be read by the read request; The second storage service node is used to respond to the read request by obtaining the address information of the data block to be read by the read request; The address information of the data block to be read is returned to the cloud disk client; The cloud disk client is used to read data from the data storage node where the data block to be read is located, based on the address information of the data block to be read.
3. A data writing method, applicable to cloud disk clients deployed on computing clusters, wherein, The method includes: In response to a write request, the metadata of the write request is sent to a first storage service node among multiple storage service nodes. The metadata of the write request is used to trigger the first storage service node to select a target number of data block replicas for the write request, and to return the address information of the target data storage node where the target number of data block replicas are located and the identifier of the target number of data block replicas. The first storage service node is a storage service node serving the first cloud disk to be written by the write request. The target data storage node belongs to a data storage system. The data storage system further includes a metadata storage node. The address information of the target data storage node is obtained by the metadata storage node. Based on the identifiers of the target number of data block replicas and the address information of the target data storage node, the data to be written corresponding to the write request is written to the target data storage node.
4. The method according to claim 3, wherein, The metadata of the write request includes: the starting logical block address (LBA) to be written, and the method further includes: Obtain the offset within the target data block and the data storage method corresponding to the starting LBA returned by the first storage service node; The step of writing the data to be written corresponding to the write request to the target data storage node according to the identifiers of the target number of data block replicas and the address information of the target data storage node includes: Based on the identifiers of the target number of data block replicas, the offset within the target data block, and the address information of the target data storage node, the data to be written is written to the target data storage node according to the data storage method.
5. The method according to claim 3, wherein, The first number of data blocks to be requested is multiple; the first number of data blocks to be requested is determined by the metadata of the first storage service node in response to the write request; The method further includes: The data storage method and the sorting result of multiple data blocks to be requested are obtained from the first storage service node. The sorting result is obtained by the first storage service node sorting the multiple data blocks to be requested according to the performance parameter information of the data block replicas corresponding to each of the multiple data blocks to be requested. Based on the sorting results, determine the top-ranked target data block among the data blocks to be requested that have not yet been written to. The step of writing the data to be written corresponding to the write request to the target data storage node according to the identifiers of the target number of data block replicas and the address information of the target data storage node includes: From the identifiers of the target number of data block replicas, obtain the identifier of the target data block replica corresponding to the target data block; Obtain the address information of the target data storage node where the target data block replica is located from the address information of the target data storage node; Based on the identifier of the target data block replica and the address information of the target data storage node where the target data block replica is located, the data to be written is written to the target data block replica according to the data storage method.
6. The method according to claim 5, wherein, Also includes: If the writing of the data to be written to the target data block copy fails, the operation of determining the target data block with the highest order among the data blocks to be requested that have not yet been written based on the sorting result is re-executed until the data to be written is successfully written or the first number of data blocks have all failed to be written. If the data to be written is successfully written or if the first number of data blocks fail to be written, the identifier of the failed data block replica is sent to the first storage service node, so that the first storage service node can modify the status information in the performance parameter information of the failed data block replica to the sealed status according to the identifier of the failed data block replica.
7. The method according to any one of claims 3-6, wherein, Also includes: A read request is sent to the second storage service node among the plurality of storage service nodes, so that the second storage service node returns the address information of the data block to be read in the read request to the cloud disk client; The second storage service node is the storage service node that serves the second cloud disk to be read by the read request; Data is read from the data storage node where the data block to be read is located, based on the address information of the data block to be read.
8. The method according to any one of claims 3-6, wherein, Also includes: If the number of connections between the cloud disk client and the multiple data storage nodes is less than a set threshold, the operation of responding to the write request and sending the metadata of the write request to the first storage service node among the multiple storage service nodes is executed. or, If the number of connections is greater than or equal to the set threshold, or if the write request is detected as a write request in a set data processing operation, the write request is sent to the first storage service node. The first storage service node then responds to the write request by writing the data to be written to the target storage service node according to the identifier of the target number of data block replicas and the address information of the target data storage node, in accordance with a preset data storage method.
9. The method of claim 7, wherein, Also includes: If the number of connections between the cloud disk client and the multiple data storage nodes is less than a set threshold, a read request carrying a first flag is sent to the second storage service node to instruct the second storage service node to return the address information of the data block to be read. or, If the number of connections is greater than or equal to the set threshold, or if the read request is detected as a read request in a set data processing operation, a read request carrying a second flag is sent to the second storage service node to instruct the second storage service node to read data from the data storage node where the data block to be read is located, based on the address information of the data block to be read.
10. A data writing method, wherein, Applicable to storage service nodes in a storage service system; The storage service system belongs to a storage cluster; The storage cluster also includes a data storage system; The data storage system includes: multiple data storage nodes and metadata storage nodes; The method includes: Obtain metadata of write requests sent by cloud disk clients in the computing cluster; the storage service node is used to serve the first cloud disk to which the write request is to be written; In response to the metadata of the write request, select a target number of data block replicas from the first cloud disk for the write request; The target number of data block replicas are requested from the metadata storage node so that the metadata storage node can return the address information of the target data storage node where the target number of data block replicas are located; The identifiers of the target number of data block replicas and the address information of the target data storage node are sent to the cloud disk client, so that the cloud disk client can write the data to be written corresponding to the write request to the target data storage node according to the identifiers of the target number of data block replicas and the address information of the target data storage node.
11. The method of claim 10, wherein, The metadata in response to the write request, selecting a target number of data block replicas from the first cloud disk for the write request, includes: In response to the metadata of the write request, determine a first number of data blocks to be requested; The target quantity is determined based on the number of copies required by the preset data storage method and the first quantity; Based on the performance parameter information of the data blocks contained in the first cloud disk, the target number of data block replicas are selected from the first cloud disk for the write request; wherein each data block to be requested corresponds to the number of data block replicas.
12. The method of claim 11, wherein, The metadata responding to the write request determines a first number of data blocks to be requested, including: Obtain the identifier of the first cloud disk from the metadata of the write request; Based on the identifier of the first cloud disk, obtain the load information of the first cloud disk; The first quantity is determined based on the load information of the first cloud disk.
13. The method of claim 10, wherein, Also includes: Obtain the starting logical block address (LBA) to be written from the metadata of the write request; Based on the first correspondence between the pre-stored LBA and the offset within the data block, determine the target offset within the data block corresponding to the starting LBA; The offset within the target data block is sent to the cloud disk client, so that the cloud disk client can write the data to be written to the target data storage node according to the target number of data block replicas, the offset within the target data block, and the address information of the target data storage node, in accordance with the data storage method.
14. The method of claim 11, wherein, Also includes: Obtain the write success metadata sent by the cloud disk client; The successful write metadata is sent by the cloud disk client when the data to be written is successfully written; The successful write metadata includes: the amount of data written to each data storage node by the write request and the response latency of the write request; The performance parameter information of the data block replica is updated based on the amount of data written to the target data storage node by the write request and the response latency of the write request.
15. The method of claim 14, wherein, The successful write metadata also includes: the starting LBA carried by the write request, the identifier of the successfully written data block, and the offset within the successfully written data block; the method further includes: Establish a second correspondence between the starting LBA, the identifier of the successfully written data block, and the offset within the successfully written data block, and persistently store the second correspondence.
16. The method of claim 11, wherein, The first quantity is multiple; the method further includes: Based on the performance parameter information of the data block replicas corresponding to each of the multiple data blocks to be requested, the multiple data blocks to be requested are sorted by performance to obtain the sorting result of the multiple data blocks to be requested; The sorting results of the multiple data blocks to be requested are sent to the cloud disk client, so that the cloud disk client can attempt to write data to the corresponding data block replicas of the multiple data blocks to be requested in turn according to the sorting results.
17. The method of claim 11, wherein, The write request is a concurrent multiple write request; the method further includes: A load balancing strategy is adopted to distribute the multiple write requests to the first number of data blocks; Sending the identifiers of the target number of data block replicas, the address information of the target data storage node, and the data storage method to the cloud disk client includes: For any one of the plurality of write requests, the identifier of the data block replica corresponding to the first data block allocated to the write request, the address information of the target data storage node where the data block replica corresponding to the first data block is located, and the data storage method are sent to the cloud disk client corresponding to the write request. This allows the cloud disk client corresponding to the write request to write the data to be written to the data block replica corresponding to the first data block according to the data storage method, based on the identifier of the data block replica corresponding to the first data block and the address information of the target data storage node where the data block replica corresponding to the first data block is located.
18. The method according to any one of claims 10-17, wherein, Also includes: Obtain the read request sent by the cloud disk client; In response to the read request, obtain the address information of the data block to be read by the read request; The address information of the data block to be read is returned to the cloud disk client so that the cloud disk client can read the data from the data storage node where the data block to be read is located based on the address information of the data block to be read.
19. The method of claim 18, wherein, The step of responding to the read request by obtaining the address information of the data block to be read includes: From the read request, obtain the target LBA and target data length corresponding to the data to be read; The identifier of the second data block corresponding to the target LBA and the offset within the second data block are determined from the second correspondence between the starting LBA stored, the identifier of the successfully written data block, and the offset within the successfully written data block. The quadruple consisting of the target LBA, the target data length, the identifier of the second data block, and the offset within the second data block is used as the address information of the data block to be read.
20. A data reading method, applicable to cloud disk clients in a computing cluster, wherein, The method includes: A read request is sent to a second storage service node among multiple storage service nodes. The second storage service node is a storage service node that serves the second cloud disk to be read by the read request; the multiple storage service nodes belong to the storage service system in the storage cluster; the storage cluster also includes a data storage system; the data storage system includes the data storage node where the data block to be read is located; Obtain the address information of the data block to be read in the read request returned by the second storage service node; Data is read from the data storage node where the data block to be read is located, based on the address information of the data block to be read.
21. A data reading method, applicable to a storage service node in a storage service system, wherein, The storage service system belongs to a storage cluster; The storage cluster also includes a data storage system; The data storage system includes: a data storage node for storing data; the method includes: Get read requests sent by cloud disk clients in the computing cluster; In response to the read request, the address information of the data block to be read is obtained; the data block to be read is located in the data storage node; The address information of the data block to be read is returned to the cloud disk client so that the cloud disk client can read the data from the data storage node where the data block to be read is located based on the address information of the data block to be read.
22. An electronic device, wherein, include: Memory, processor, and communication components; wherein the memory is used to store computer programs; The processor is coupled to the memory and the communication component for executing the computer program to perform the steps of the method according to any one of claims 3-21.
23. A computer-readable storage medium storing computer instructions, wherein, When the computer instructions are executed by one or more processors, the one or more processors are caused to perform the steps of the method according to any one of claims 3-21.
24. A computer program product, wherein, Includes a computer program that, when executed by one or more processors, causes the one or more processors to perform the steps of the method according to any one of claims 3-21.