Data distribution method, apparatus and device, medium, and program product
By adopting a three-tiered data distribution architecture and layered management approach, the bottlenecks of centralized network traffic and single-point distribution capabilities between image and snapshot data storage pools and cloud disks are resolved, achieving load-balanced and traffic-balanced data distribution and improving data distribution efficiency.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- CHINA MOBILE (SUZHOU) SOFTWARE TECH CO LTD
- Filing Date
- 2025-11-27
- Publication Date
- 2026-06-11
Smart Images

Figure CN2025138217_11062026_PF_FP_ABST
Abstract
Description
A data distribution method, apparatus, device, medium, and program product
[0001] Cross-references to related applications
[0002] This application is based on and claims priority to Chinese Patent Application No. 202411785592.2, filed on December 6, 2024, the entire contents of which are incorporated herein by reference. Technical Field
[0003] This application relates to the field of computer technology, specifically to a data distribution method, apparatus, device, medium, and program product. Background Technology
[0004] Currently, the management method for image and snapshot data involves storing them uniformly in a low-cost storage pool and scheduling the creation of cloud disks to different storage clusters. However, since data transmission between the image and snapshot data storage pool and the storage cluster where the cloud disks reside requires network communication, this leads to issues such as concentrated network traffic and bottlenecks in single-point distribution capabilities during data distribution. Summary of the Invention
[0005] The purpose of this application is to provide a data distribution method, apparatus, equipment, medium, and program product.
[0006] This application provides a data distribution method, including:
[0007] Obtain the target disk creation instruction, which instructs the download of target data to the target disk storage cluster, the target data including multiple target data blocks;
[0008] If the data cache layer stores the target data block to be downloaded, then the target data block is downloaded from the data cache layer to the target disk storage cluster;
[0009] If the target data block is not stored in the data cache layer, the target data block is downloaded from the data distribution layer or data warehouse to the target disk storage cluster.
[0010] Optionally, in the data distribution method, downloading the target data block from the data distribution layer or data warehouse to the target disk storage cluster includes:
[0011] If the data distribution layer stores the target data block, then the target data block is downloaded from the data distribution layer to the target disk storage cluster;
[0012] If the data distribution layer does not store the target data block, then the target data block is downloaded from the data warehouse to the target disk storage cluster.
[0013] Optionally, in the data distribution method, the data caching layer includes at least one first storage cluster, each of which is used to download and store full or part of the target data from the data distribution layer.
[0014] The step of downloading the target data block from the data cache layer to the target disk storage cluster includes:
[0015] Based on the current load and traffic information of the data caching layer, a data access routing table is constructed with the goal of load balancing and traffic balancing. The data access routing table is used to indicate the target first storage cluster in the data distribution layer where the target data block is stored.
[0016] Access the routing table based on the data to obtain the target first storage cluster;
[0017] Download the target data block from the target first storage cluster to the target disk storage cluster.
[0018] Optionally, the data distribution method further includes:
[0019] When all target data is downloaded to the target disk storage cluster, the target disk storage cluster is added to the data cache layer.
[0020] Optionally, in the data distribution method, the data distribution layer includes at least one second storage cluster, each of which is used to download and store full or part of the target data from the data warehouse.
[0021] The method further includes:
[0022] If all the target data is downloaded to the target disk storage cluster, the target disk storage cluster is added to the second storage cluster candidate sequence.
[0023] If the number of second storage clusters in the data distribution layer is less than the quantity threshold or the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster in the candidate sequence of the second storage clusters is added to the data distribution layer.
[0024] Optionally, in the data distribution method, the data distribution layer includes at least one second storage cluster, each of which is used to download and store full or part of the target data from the data warehouse.
[0025] The method further includes:
[0026] If the target data is downloaded to the target disk storage cluster in full, and the number of second storage clusters in the data distribution layer is less than the quantity threshold or the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster is added to the data distribution layer.
[0027] Optionally, in the data distribution method, before downloading the target data block to the target disk storage cluster from the data cache layer if the target data block to be downloaded is stored in the data cache layer, the method further includes:
[0028] Determine whether the target disk storage cluster stores the target data block;
[0029] If the target disk storage cluster does not store the target data block, then it is determined whether the target data block is stored in the data cache layer.
[0030] Optionally, in the data distribution method, before downloading the target data block to the target disk storage cluster from the data cache layer if the target data block to be downloaded is stored in the data cache layer, the method further includes:
[0031] Determine whether the target data is stored in the data warehouse;
[0032] If the target data is stored in the data warehouse, then determine whether the target data block is stored in the data cache layer.
[0033] This application embodiment also provides a data distribution device, including:
[0034] The acquisition module is configured to acquire a target disk creation instruction, which is used to instruct the download of target data to the target disk storage cluster. The target data includes multiple target data blocks.
[0035] The first download module is configured to download the target data block to the target disk storage cluster from the data cache layer if the target data block to be downloaded is stored in the data cache layer.
[0036] The second download module is configured to download the target data block from the data distribution layer or data warehouse to the target disk storage cluster if the target data block is not stored in the data cache layer.
[0037] This application also provides a data distribution device, including: a processor, a memory, and a program stored in the memory and executable on the processor, wherein the program, when executed by the processor, implements the data distribution method as described in the first aspect.
[0038] This application also provides a readable storage medium storing a program that, when executed by a processor, implements the data distribution method as described in the first aspect.
[0039] This application also provides a computer program product, including computer instructions, which, when executed by a processor, implement the data distribution method as described in the first aspect.
[0040] The beneficial effects of the above technical solutions in the embodiments of this application are as follows:
[0041] The data distribution method described in this application downloads target data, including multiple target data blocks, to the target disk storage cluster according to the obtained target disk creation instruction. If the target data blocks to be downloaded are stored in the data cache layer, they are downloaded directly from the data cache layer to the target disk storage cluster. If the target data blocks are not stored in the data cache layer, they are downloaded to the target disk storage cluster through the data distribution layer or a data warehouse. By using a fully interconnected data cache layer, data block layer, and data warehouse, target data is managed hierarchically, minimizing data warehouse access traffic and avoiding the impact of data warehouse bandwidth limitations and the low-speed network between the data warehouse and the target disk storage cluster on target disk read / write operations. Furthermore, it reduces access traffic to the data distribution layer, solving the problems of concentrated network traffic and single-point distribution capability bottlenecks. Attached Figure Description
[0042] Figure 1 is a schematic diagram of the application system architecture of the data distribution method provided in the embodiment of this application;
[0043] Figure 2 is a flowchart illustrating the data distribution method provided in an embodiment of this application;
[0044] Figure 3 is a flowchart illustrating an implementation of the data distribution method provided in this application.
[0045] Figure 4 is a schematic diagram of the data distribution device provided in an embodiment of this application;
[0046] Figure 5 is a hardware block diagram of the data distribution device provided in an embodiment of this application. Detailed Implementation
[0047] To make the technical problems, technical solutions and advantages of this application clearer, a detailed description will be provided below in conjunction with the accompanying drawings and specific embodiments.
[0048] Before providing a detailed description of the technical solutions in the embodiments of this application, a brief description of the relevant technologies will be given first.
[0049] Traditional physical servers and cloud-based virtual machine instances typically mount a disk (called a cloud disk in cloud environments) to store the data required by the operating system; this disk is called the system disk. Furthermore, most servers and virtual machine instances also mount one or more disks to store application data; these are called data disks. The system disk requires pre-configuration to load the operating system's configuration and data, while the data disk can be created as an empty disk to store data subsequently written by applications, or it can use the same method as the system disk to pre-load data.
[0050] An image is a software package containing the configuration and data required for an operating system to boot and run. It can contain only data from a single system disk, or it can contain data from the system disk and multiple data disks. Its underlying data is typically stored on low-cost cold storage media (such as object storage). When the image is used to create a server or virtual machine instance, the data from the system disk and data disks in the image is imported into the corresponding system disk and data disks mounted on the server or virtual machine instance. Subsequent booting and running of the server and virtual machine instance are independent of the image; data and configuration are directly obtained from the mounted disks.
[0051] Besides the scenarios of creating and starting server and virtual machine instances, there are also scenarios of mounting new data disks on existing running server or virtual machine instances, i.e., system expansion. The new data disk can be an empty disk or a disk pre-loaded with data. Pre-loaded data storage is generally achieved through snapshots, and its data storage and usage are the same as images.
[0052] Therefore, both server or virtual machine instance creation and startup scenarios, as well as system expansion scenarios, can be abstracted as a model of loading image or snapshot data onto a target disk. Related technologies manage image and snapshot data by storing them uniformly in a low-cost storage pool, while cloud disk creation is scheduled across different storage clusters. However, because data transfer between the image and snapshot data storage pool and the storage cluster hosting the cloud disk requires network communication, this management method suffers from network traffic concentration and single-point distribution bottlenecks during data distribution.
[0053] Based on this, this application proposes a data distribution method. Figure 1 is a schematic diagram of the application system architecture of the data distribution method provided in this application embodiment. As shown in Figure 1, the system architecture includes at least three layers: a data warehouse, a data distribution layer, and a data cache layer. The data warehouse is used to store full-volume mirror data and snapshot data, and can therefore also be called an image registry. The data distribution layer can also be called a distribution group, and the data cache layer can also be called a block cache layer.
[0054] The data warehouse, typically a third-party service, addresses the raw storage of mirrored and snapshot data, providing mirrored storage and shared access capabilities across the entire region. It contains all mirrored and snapshot data. After creation, the initial data is stored, serving as the starting point for subsequent distribution. Both mirrored and snapshot data are initially retrieved from this layer, and identical data only needs to access the data warehouse once during the data distribution layer's construction; subsequent access is restricted to the high-speed interconnected cluster on the storage side.
[0055] The data distribution layer addresses traffic bottlenecks and low-speed network issues in the data warehouse and disk (or cloud disk) storage clusters (i.e., the storage clusters where the disks reside), preventing any impact on disk (or cloud disk) performance. As a data replica of mirrored and snapshot data on the storage side, the data distribution layer includes the full amount of mirrored and snapshot data, used for subsequent caching of mirrored and snapshot data in the storage clusters and distribution of disk (or cloud disk) data. The data distribution layer operates at the cluster level, comprising multiple storage clusters. Each storage cluster stores a full or partial copy of the specified mirrored and snapshot data, with multiple storage clusters constituting the full amount of mirrored and snapshot data. The flow of mirrored and snapshot data on the storage side begins at the data distribution layer. When mirrored and snapshot data are not cached in a particular storage cluster, or when the number of storage clusters in the data caching layer is insufficient, it is responsible for allocating a portion of the data traffic for downloading mirrored and snapshot data to other storage clusters. Once the mirrored data is sufficiently cached in the data caching layer, the data traffic is distributed across the various storage clusters in the data caching layer, and the concurrent processing capabilities of multiple storage clusters are utilized to avoid the bottleneck of bandwidth and processing capacity when a single storage cluster provides the traffic for the entire system.
[0056] It should be noted that the data distribution layer employs a flexible distribution strategy. Each storage cluster in the data distribution layer can include both full mirror data, providing data replication capabilities, and partial mirror data. Access to the full mirror data is achieved through unified data access scheduling, reducing storage costs and ensuring balanced traffic. The addition and release of storage clusters in the data distribution layer are controlled by multiple dimensions, including the number of storage clusters and their cache expiration periods. When the number of storage clusters in the data distribution layer falls below the maximum member limit, new storage clusters are allowed to be added to distribute access traffic and processing capacity. Once the maximum number of storage clusters is reached, the cache expiration period for each storage cluster is calculated. Storage clusters exceeding their cache expiration period are replaced by the one with the longest cache expiration period when a new storage cluster is added. The replaced storage cluster releases its mirror cache data to minimize the impact on the cluster's saleable space.
[0057] The data caching layer addresses issues such as data access traffic sets, bottlenecks in traffic and processing capacity for single-cluster data distribution, and the reduction of saleable space due to the occupancy of full data caching on cluster space. As a distributed cache for storage-side image and snapshot data blocks, the data caching layer includes full or partial image and snapshot data, used for data distribution when creating disks (or cloud disks) from image and snapshot data. The data caching layer also operates at the cluster level. Multiple storage clusters within the data caching layer employ a fully interconnected data distribution mesh structure, with each storage cluster having equal status. Cached image data can originate from the data distribution layer or from other storage clusters within the data caching layer. Storage clusters caching partial data share image and snapshot data caches. When creating a cloud disk (or cloud disk creation), a data access routing table is constructed based on a specified scheduling strategy, selecting storage clusters for traffic and load balancing. Each storage cluster in the data access routing table includes partial cached image and snapshot data. Data for creating a mirror disk is downloaded concurrently from each storage cluster in the data access routing table to construct the full image data.
[0058] It should be noted that the joining and leaving of the storage cluster in the data caching layer depends on the creation and release of image data and snapshot data caches. For example, a storage cluster can join the data distribution layer after it has cached all the image data.
[0059] It should be noted that the application system architecture of the data distribution method in this application embodiment is dynamically constructed. Taking a new image data as an example, the dynamic construction process of the system architecture is as follows:
[0060] First, since the new image data is not cached in the entire application system, the data distribution layer is empty at this time. Therefore, the data download initiated when the image is created is directly downloaded from the data warehouse. At the same time, the downloaded image data is cached in the current storage cluster, and the current storage cluster, which serves as the data distribution layer, provides a high-speed access channel for subsequent image data access.
[0061] Then, the storage cluster of the data distribution layer will initiate partial data caching operations to each storage cluster through the unified management service role of multiple storage clusters. The mirrored data is evenly scheduled and distributed among the storage clusters for caching. At the same time, the storage clusters with partial cached data are added to the data distribution layer until the number of storage clusters reaches the upper limit. At this time, the data distribution layer is completed.
[0062] Next, the mirrored data continues to be cached in blocks across the storage clusters according to the load balancing algorithm. Each storage cluster has a certain degree of cached data redundancy, thus enabling the distribution of data block access traffic. These storage clusters share data to form a mesh-like data cache layer, and decide whether to apply to join the data distribution layer based on the data caching status. If a storage cluster that has completed caching fails to join the data distribution layer, its cache space will be cleared after a period of retention to release cluster space. This process is then repeated to achieve dynamic balance in the multi-layered data distribution structure.
[0063] To address the issues of concentrated network traffic and single-point distribution capacity bottlenecks in existing data distribution technologies, this application provides a data distribution method. The method involves obtaining a target disk creation instruction, which directs the download of target data to a target disk storage cluster. The target data includes multiple target data blocks. If the target data blocks to be downloaded are stored in the data cache layer, they are downloaded from the data cache layer to the target disk storage cluster. If the target data blocks are not stored in the data cache layer, they are downloaded from the data distribution layer or the data warehouse to the target disk storage cluster. This implementation method uses a fully interconnected data cache layer, data block layer, and data warehouse to perform layered management of the target data, minimizing access traffic to the data warehouse and avoiding the impact on target disk read / write operations caused by bandwidth limitations of the data warehouse and the non-high-speed network between the data warehouse and the target disk storage cluster. It also reduces access traffic to the data distribution layer, thus solving the problems of concentrated network traffic and single-point distribution capacity bottlenecks in data distribution.
[0064] Figure 2 is a flowchart illustrating the data distribution method provided in an embodiment of this application. As shown in Figure 2, this embodiment of the application provides a data distribution method, including:
[0065] S201, Obtain the target disk creation instruction. The target disk creation instruction is used to instruct the download of target data to the target disk storage cluster. The target data includes multiple target data blocks.
[0066] It should be noted that this application embodiment does not impose specific limitations on the target disk. For example, if the target disk is a cloud disk, the target disk creation instruction is a cloud disk creation instruction; if the target disk is a hard disk, the target disk creation instruction is a hard disk creation instruction. The target disk creation instruction is used to instruct the download of the full target data to the target disk storage cluster and to copy the full target data to the target disk. The target data includes mirror data or snapshot data.
[0067] In some embodiments, after S201 and before S202, the method further includes:
[0068] Determine whether the target data is stored in the data warehouse;
[0069] If the target data is stored in the data warehouse, then determine whether the target data block is stored in the data cache layer.
[0070] In some embodiments, after S201 and before S202, the method further includes:
[0071] Determine whether the target disk storage cluster stores the target data block;
[0072] If the target disk storage cluster does not store the target data block, then it is determined whether the target data block is stored in the data cache layer.
[0073] In some embodiments, after S201 and before S202, the method further includes:
[0074] Determine whether the target data is stored in the data warehouse;
[0075] If the target data is stored in the data warehouse, then determine whether the target disk storage cluster stores the target data block;
[0076] If the target disk storage cluster does not store the target data block, then it is determined whether the target data block is stored in the data cache layer.
[0077] S202, if the data cache layer stores the target data block to be downloaded, then download the target data block from the data cache layer to the target disk storage cluster.
[0078] It is understandable that S202 includes:
[0079] Determine whether the target data block is stored in the data cache layer;
[0080] If the data cache layer stores the target data block to be downloaded, then the target data block is downloaded from the data cache layer to the target disk storage cluster.
[0081] In some embodiments, the data caching layer includes at least one first storage cluster, each of which is used to download and store full or partial target data from the data distribution layer.
[0082] The step of downloading the target data block from the data cache layer to the target disk storage cluster includes:
[0083] Based on the current load and traffic information of the data caching layer, a data access routing table is constructed with the goal of load balancing and traffic balancing. The data access routing table is used to indicate the target first storage cluster in the data distribution layer where the target data block is stored.
[0084] Access the routing table based on the data to obtain the target first storage cluster;
[0085] Download the target data block from the target first storage cluster to the target disk storage cluster.
[0086] In this embodiment, when downloading the target data block from the data cache layer, it is necessary to first obtain the target first storage cluster where the target data block is stored according to the data access routing table, and then download the target data block from the target first storage cluster to the target disk storage cluster.
[0087] Since the data access routing table is constructed based on the load and traffic information of each first storage cluster in the current data caching layer, and with the goal of load balancing and traffic balancing, the data access routing table can avoid overload or traffic overload of a single first storage cluster, thus achieving the effect of load or traffic balancing.
[0088] Specifically, the data access routing table includes the target first storage cluster corresponding to each target data block.
[0089] It should be noted that, in this embodiment of the application, the target data includes multiple target data blocks that can be downloaded concurrently.
[0090] S203, if the target data block is not stored in the data cache layer, then download the target data block from the data distribution layer or data warehouse to the target disk storage cluster.
[0091] It is understandable that S203 includes:
[0092] Determine whether the target data block is stored in the data cache layer;
[0093] If the target data block is not stored in the data cache layer, the target data block is downloaded from the data distribution layer or data warehouse to the target disk storage cluster.
[0094] In some embodiments, downloading the target data block from the data distribution layer or data warehouse to the target disk storage cluster includes:
[0095] Determine whether the data distribution layer stores the target data block;
[0096] If the data distribution layer stores the target data block, then the target data block is downloaded from the data distribution layer to the target disk storage cluster;
[0097] If the data distribution layer does not store the target data block, then the target data block is downloaded from the data warehouse to the target disk storage cluster.
[0098] It is understood that the data distribution method described in this application only downloads the target data block from the data warehouse to the target disk storage cluster when neither the data cache layer nor the data distribution layer has the target data block. This minimizes the access traffic to the data warehouse and avoids the impact of bandwidth limitations of the data warehouse and the non-high-speed network between the data warehouse and the target disk storage cluster on target disk read and write operations.
[0099] In some embodiments, the method further includes:
[0100] When all target data is downloaded to the target disk storage cluster, the target disk storage cluster is added to the data cache layer.
[0101] It should be noted that after the target data block is downloaded to the target disk storage cluster, it is determined whether there are still target data blocks to be downloaded. If there are still target data blocks to be downloaded, then the above steps S202 and S203 are executed to download the target data blocks to be downloaded to the target disk storage cluster. If there are no target data blocks to be downloaded, that is, when all the target data has been downloaded to the target disk storage cluster, the target disk storage cluster is added to the data cache layer, so that it can serve as a data cache replica to share the load and traffic of subsequent access to the same target data, until the cache expires and exits the data cache layer.
[0102] In some embodiments, the data distribution layer includes at least one second storage cluster, each of which is used to download and store full or partial target data from the data warehouse.
[0103] The method further includes:
[0104] If all the target data is downloaded to the target disk storage cluster, the target disk storage cluster is added to the second storage cluster candidate sequence.
[0105] If the number of second storage clusters in the data distribution layer is less than the quantity threshold or the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster in the candidate sequence of the second storage clusters is added to the data distribution layer.
[0106] In this embodiment, after the target data block is downloaded to the target disk storage cluster, it is determined whether there are still target data blocks to be downloaded. If there are still target data blocks to be downloaded, then according to the above S202 and S203, the target data blocks to be downloaded are downloaded to the target disk storage cluster; if there are no target data blocks to be downloaded, it means that the entire target data has been downloaded to the target disk storage cluster.
[0107] Then, after all the target data has been downloaded to the target disk storage cluster, the target disk storage cluster is added to the second storage cluster candidate sequence corresponding to the data distribution layer.
[0108] Next, it is determined whether the number of second storage clusters in the data distribution layer is less than the number threshold, or whether the cache period of the target second storage cluster in the data distribution layer has expired. If the number of second storage clusters in the data distribution layer is less than the number threshold, or the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster in the candidate sequence of the second storage cluster is added to the data distribution layer, that is, the target disk storage cluster is used as the second storage cluster to share the load and traffic of subsequent access to the same target data.
[0109] It should be noted that after the target disk storage cluster is added to the second storage cluster candidate sequence, if the cache period of the target disk storage cluster has expired and the target disk storage cluster is a storage cluster added to the data distribution layer, then the target disk storage cluster is removed from the second storage cluster candidate sequence.
[0110] In some embodiments, the method further includes:
[0111] If the target data is downloaded to the target disk storage cluster in full, and the number of second storage clusters in the data distribution layer is less than the quantity threshold or the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster is added to the data distribution layer.
[0112] In this embodiment, after the target data block is downloaded to the target disk storage cluster, it is determined whether there are still target data blocks to be downloaded. If there are still target data blocks to be downloaded, then the above S202 and S203 are executed to download the target data blocks to the target disk storage cluster. If there are no target data blocks to be downloaded, it means that the entire target data has been downloaded to the target disk storage cluster.
[0113] Then, if all the target data is downloaded to the target disk storage cluster, determine whether the number of the second storage clusters of the data distribution layer is less than the number threshold, or determine whether the cache period of the target second storage cluster of the data distribution layer has expired.
[0114] If the number of second storage clusters in the data distribution layer is less than the number threshold, or if the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster is added to the data distribution layer, that is, the target disk storage cluster is used as the second storage cluster to share the load and traffic of subsequent access to the same target data.
[0115] It should be noted that if the number of second storage clusters in the data distribution layer is greater than or equal to the number threshold, and the cache period of the target second storage cluster in the data distribution layer has not expired, then the target disk storage cluster serves as a data cache copy until the cache expires.
[0116] For example, Figure 3 is a flowchart illustrating an implementation of the data distribution method provided in this application; as shown in Figure 3, when the target disk is a cloud disk, this implementation includes the following steps:
[0117] S301, retrieve cloud disk creation instructions.
[0118] S302, Determine whether mirrored data is stored in the data warehouse.
[0119] If the result of S302 is yes, then S303 is executed; if the result of S303 is no, then S314 is executed.
[0120] S303, determine if there are any more data blocks to be downloaded.
[0121] If the judgment result of S303 is yes, then S304 to S306 are executed.
[0122] S304, Select data block download.
[0123] S305, query data caching layer.
[0124] S306, determine whether the data cache layer stores data blocks.
[0125] If the judgment result of S306 is yes, then S307 to S308 are executed; if the judgment result of S306 is no, then S309 is executed.
[0126] S307, retrieves the storage cluster where the data cache layer is stored.
[0127] Specifically, based on the data access routing table, the storage cluster stored in the data cache layer is obtained, and S308 and S316 are executed simultaneously.
[0128] S308 downloads data blocks from the storage cluster to the cloud disk storage cluster.
[0129] S309, determine whether the data distribution layer stores data blocks.
[0130] If the result of S309 is yes, then S310 is executed; if the result of S309 is no, then S311 is executed.
[0131] The S310 downloads data blocks from the data distribution layer to the cloud disk storage cluster.
[0132] S311 downloads data blocks from the data warehouse to the cloud disk storage cluster.
[0133] S312, determine whether the data block was downloaded successfully.
[0134] If the judgment result of S312 is yes, then S313 is executed.
[0135] S313 adds cloud disk storage clusters to the data caching layer.
[0136] Specifically, after the data block is successfully downloaded, the cloud disk storage cluster is added to the data cache layer, and S303 is executed to determine whether there are any more data blocks to be downloaded.
[0137] S314, determine whether the data distribution layer needs to be updated.
[0138] If the judgment result of S314 is yes, then S315 is executed; if the judgment result of S314 is no, then the data distribution is completed.
[0139] S315 adds cloud disk storage clusters to the data distribution layer.
[0140] S316 determines whether the cloud disk storage cluster includes the full amount of image data.
[0141] If the result of S316 is yes, then S314 is executed.
[0142] In summary, by adopting the data distribution method described in the embodiments of this application, a data distribution network with at least three layers and full interconnection is constructed. In the data caching layer, each storage cluster caches part of the data, realizing distributed caching and traffic-balanced caching, minimizing access traffic to the data warehouse, reducing access traffic to the data distribution layer, avoiding network traffic concentration and single-point distribution capacity bottlenecks, and avoiding the impact of data warehouse access bandwidth limitations and processing capacity bottlenecks on disk (or cloud disk) read and write operations.
[0143] Figure 4 is a schematic diagram of the structure of the data distribution device provided in an embodiment of this application; as shown in Figure 4, an embodiment of this application provides a data distribution device, including:
[0144] The acquisition module 401 is configured to acquire a target disk creation instruction, which is used to instruct the download of target data to the target disk storage cluster, and the target data includes multiple target data blocks.
[0145] The first download module 402 is configured to download the target data block to the target disk storage cluster from the data cache layer if the target data block to be downloaded is stored in the data cache layer.
[0146] The second download module 403 is configured to download the target data block from the data distribution layer or data warehouse to the target disk storage cluster if the target data block is not stored in the data cache layer.
[0147] In some embodiments, the data distribution apparatus, wherein the second download module 403 is configured as follows:
[0148] If the data distribution layer stores the target data block, then the target data block is downloaded from the data distribution layer to the target disk storage cluster;
[0149] If the data distribution layer does not store the target data block, then the target data block is downloaded from the data warehouse to the target disk storage cluster.
[0150] In some embodiments, the data distribution apparatus includes a data caching layer comprising at least one first storage cluster, each of the first storage clusters being configured to download and store full or partial target data from the data distribution layer.
[0151] The first download module 402 is configured as follows:
[0152] Based on the current load and traffic information of the data caching layer, a data access routing table is constructed with the goal of load balancing and traffic balancing. The data access routing table is used to indicate the target first storage cluster in the data distribution layer where the target data block is stored.
[0153] Access the routing table based on the data to obtain the target first storage cluster;
[0154] Download the target data block from the target first storage cluster to the target disk storage cluster.
[0155] In some embodiments, the data distribution apparatus further includes:
[0156] The first addition module is configured to add the target disk storage cluster to the data cache layer when the full target data is downloaded to the target disk storage cluster.
[0157] In some embodiments, the data distribution apparatus includes a data distribution layer comprising at least one second storage cluster, each second storage cluster being configured to download and store full or partial target data from the data warehouse.
[0158] The device further includes:
[0159] The second adding module is configured to add the target disk storage cluster to the second storage cluster candidate sequence when all the target data is downloaded to the target disk storage cluster.
[0160] If the number of second storage clusters in the data distribution layer is less than the quantity threshold or the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster in the candidate sequence of the second storage clusters is added to the data distribution layer.
[0161] In some embodiments, the data distribution apparatus includes a data distribution layer comprising at least one second storage cluster, each second storage cluster being configured to download and store full or partial target data from the data warehouse.
[0162] The device further includes:
[0163] The third addition module is configured to add the target disk storage cluster to the data distribution layer if, when all target data is downloaded to the target disk storage cluster, the number of second storage clusters in the data distribution layer is less than a quantity threshold or the cache period of the target second storage cluster in the data distribution layer has expired.
[0164] In some embodiments, the data distribution apparatus further includes:
[0165] The first judgment module is configured to determine whether the target disk storage cluster stores the target data block;
[0166] The second judgment module is configured to determine whether the target data block is stored in the data cache layer if the target disk storage cluster does not store the target data block.
[0167] In some embodiments, the data distribution apparatus further includes:
[0168] The third judgment module is configured to determine whether the target data is stored in the data warehouse;
[0169] The fourth judgment module is configured to determine whether the target data block is stored in the data cache layer if the target data is stored in the data warehouse.
[0170] The data distribution apparatus provided in this application embodiment can execute the above-described data distribution method embodiment, and its implementation principle and technical effect are similar, so it will not be described again here.
[0171] Figure 5 is a hardware block diagram of a data distribution device provided in an embodiment of this application; as shown in Figure 5, an embodiment of this application also provides a data distribution device, including: a processor 501; and a memory 502 connected to the processor 501 via a bus interface, the memory 502 being used to store programs and data used by the processor 501 when performing operations, and the processor 501 calling and executing the programs and data stored in the memory 502.
[0172] The data distribution device further includes a transceiver 503, which is connected to a bus interface and is used to receive and send data under the control of the processor 501.
[0173] Specifically, the transceiver 503 is configured to perform the following processes under the control of the processor 501:
[0174] Obtain the target disk creation instruction, which instructs the download of target data to the target disk storage cluster, the target data including multiple target data blocks;
[0175] The processor 501 is used to read the program and execute the following procedures:
[0176] If the data cache layer stores the target data block to be downloaded, then the target data block is downloaded from the data cache layer to the target disk storage cluster;
[0177] If the target data block is not stored in the data cache layer, the target data block is downloaded from the data distribution layer or data warehouse to the target disk storage cluster.
[0178] In Figure 5, the bus architecture may include any number of interconnected buses and bridges, specifically linking various circuits of one or more processors represented by processor 501 and memory represented by memory 502. The bus architecture may also link various other circuits such as peripheral devices, voltage regulators, and power management circuits, which are well known in the art and therefore will not be described further herein. A bus interface provides a user interface 504. A transceiver 503 may be multiple elements, including a transmitter and a receiver, providing a unit for communicating with various other devices over a transmission medium. Processor 501 is responsible for managing the bus architecture and general processing, and memory 502 may store data used by processor 501 during operation.
[0179] The processor 501 is responsible for managing the bus architecture and general processing, while the memory 502 can store the data used by the processor 501 when performing operations.
[0180] In some embodiments, the data distribution device, wherein the processor 501 is specifically configured to read the program and execute the following processes:
[0181] If the data distribution layer stores the target data block, then the target data block is downloaded from the data distribution layer to the target disk storage cluster;
[0182] If the data distribution layer does not store the target data block, then the target data block is downloaded from the data warehouse to the target disk storage cluster.
[0183] In some embodiments, the data distribution device includes a data caching layer comprising at least one first storage cluster, each of the first storage clusters being used to download and store full or part of the target data from the data distribution layer.
[0184] The processor 501 is specifically used to read the program and execute the following processes:
[0185] Based on the current load and traffic information of the data caching layer, a data access routing table is constructed with the goal of load balancing and traffic balancing. The data access routing table is used to indicate the target first storage cluster in the data distribution layer where the target data block is stored.
[0186] Access the routing table based on the data to obtain the target first storage cluster;
[0187] Download the target data block from the target first storage cluster to the target disk storage cluster.
[0188] In some embodiments, the data distribution device further includes a processor 501 configured to read the program and execute the following processes:
[0189] When all target data is downloaded to the target disk storage cluster, the target disk storage cluster is added to the data cache layer.
[0190] In some embodiments, the data distribution device includes a data distribution layer comprising at least one second storage cluster, each second storage cluster being used to download and store full or partial target data from the data warehouse.
[0191] The processor 501 is also configured to read the program and execute the following processes:
[0192] If all the target data is downloaded to the target disk storage cluster, the target disk storage cluster is added to the second storage cluster candidate sequence.
[0193] If the number of second storage clusters in the data distribution layer is less than the quantity threshold or the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster in the candidate sequence of the second storage clusters is added to the data distribution layer.
[0194] In some embodiments, the data distribution device includes a data distribution layer comprising at least one second storage cluster, each second storage cluster being used to download and store full or partial target data from the data warehouse.
[0195] The processor 501 is also configured to read the program and execute the following processes:
[0196] If the target data is downloaded to the target disk storage cluster in full, and the number of second storage clusters in the data distribution layer is less than the quantity threshold or the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster is added to the data distribution layer.
[0197] In some embodiments, the data distribution device further includes a processor 501 configured to read the program and execute the following processes:
[0198] Determine whether the target disk storage cluster stores the target data block;
[0199] If the target disk storage cluster does not store the target data block, then it is determined whether the target data block is stored in the data cache layer.
[0200] In some embodiments, the data distribution device further includes a processor 501 configured to read the program and execute the following processes:
[0201] Determine whether the target data is stored in the data warehouse;
[0202] If the target data is stored in the data warehouse, then determine whether the target data block is stored in the data cache layer.
[0203] This application also provides a computer-readable storage medium storing a computer program thereon. When the program is executed by a processor, it implements the steps in the above-described data distribution method and achieves the same technical effect. To avoid repetition, it will not be described again here.
[0204] In addition, this application also provides a computer program product, including computer instructions. When the computer instructions are executed by a processor, they implement the various processes of the method embodiment shown in FIG2 above and can achieve the same technical effect. To avoid repetition, they will not be described again here.
[0205] In the several embodiments provided in this application, it should be understood that the disclosed methods and apparatus can be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between devices or units may be electrical, mechanical, or other forms.
[0206] Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can be physically included separately, or two or more units can be integrated into one unit. The integrated unit can be implemented in hardware or in the form of hardware plus software functional units.
[0207] The integrated units implemented as software functional units described above can be stored in a computer-readable storage medium. These software functional units, stored in a storage medium, include several instructions that cause a computer device (which may be a personal computer, server, or network device, etc.) to execute some steps of the transmission and reception methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.
[0208] The above description represents optional embodiments of this application. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles described in this application, and these improvements and modifications should also be considered within the scope of protection of this application.
Claims
1. A data distribution method, comprising: Obtain the target disk creation instruction, which instructs the download of target data to the target disk storage cluster, the target data including multiple target data blocks; If the data cache layer stores the target data block to be downloaded, then the target data block is downloaded from the data cache layer to the target disk storage cluster; If the target data block is not stored in the data cache layer, the target data block is downloaded from the data distribution layer or data warehouse to the target disk storage cluster.
2. The data distribution method according to claim 1, wherein, The step of downloading the target data block from the data distribution layer or data warehouse to the target disk storage cluster includes: If the data distribution layer stores the target data block, then the target data block is downloaded from the data distribution layer to the target disk storage cluster; If the data distribution layer does not store the target data block, then the target data block is downloaded from the data warehouse to the target disk storage cluster.
3. The data distribution method according to claim 1, wherein, The data caching layer includes at least one first storage cluster, each of which is used to download and store full or part of the target data from the data distribution layer. The step of downloading the target data block from the data cache layer to the target disk storage cluster includes: Based on the current load and traffic information of the data caching layer, a data access routing table is constructed with the goal of load balancing and traffic balancing. The data access routing table is used to indicate the target first storage cluster in the data distribution layer where the target data block is stored. Access the routing table based on the data to obtain the target first storage cluster; Download the target data block from the target first storage cluster to the target disk storage cluster.
4. The data distribution method according to claim 1, wherein, The method further includes: When all target data is downloaded to the target disk storage cluster, the target disk storage cluster is added to the data cache layer.
5. The data distribution method according to claim 1, wherein, The data distribution layer includes at least one second storage cluster, each of which is used to download and store full or part of the target data from the data warehouse. The method further includes: If all the target data is downloaded to the target disk storage cluster, the target disk storage cluster is added to the second storage cluster candidate sequence. If the number of second storage clusters in the data distribution layer is less than the quantity threshold or the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster in the candidate sequence of the second storage clusters is added to the data distribution layer.
6. The data distribution method according to claim 1, wherein, The data distribution layer includes at least one second storage cluster, each of which is used to download and store full or part of the target data from the data warehouse. The method further includes: If the target data is downloaded to the target disk storage cluster in full, and the number of second storage clusters in the data distribution layer is less than the quantity threshold or the cache period of the target second storage cluster in the data distribution layer has expired, then the target disk storage cluster is added to the data distribution layer.
7. The data distribution method according to claim 1, wherein, Before downloading the target data block to the target disk storage cluster from the data cache layer, if the target data block to be downloaded is stored in the data cache layer, the method further includes: Determine whether the target disk storage cluster stores the target data block; If the target disk storage cluster does not store the target data block, then it is determined whether the target data block is stored in the data cache layer.
8. The data distribution method according to claim 1, wherein, Before downloading the target data block to the target disk storage cluster from the data cache layer, if the target data block to be downloaded is stored in the data cache layer, the method further includes: Determine whether the target data is stored in the data warehouse; If the target data is stored in the data warehouse, then determine whether the target data block is stored in the data cache layer.
9. A data distribution device, comprising: The acquisition module is configured to acquire a target disk creation instruction, which is used to instruct the download of target data to the target disk storage cluster. The target data includes multiple target data blocks. The first download module is configured to download the target data block to the target disk storage cluster from the data cache layer if the target data block to be downloaded is stored in the data cache layer. The second download module is configured to download the target data block from the data distribution layer or data warehouse to the target disk storage cluster if the target data block is not stored in the data cache layer.
10. A data distribution device, comprising: A processor, a memory, and a program stored in the memory and executable on the processor, wherein the program, when executed by the processor, implements the data distribution method as described in any one of claims 1 to 8.
11. A readable storage medium storing a program that, when executed by a processor, implements the data distribution method as described in any one of claims 1 to 8.
12. A computer program product comprising computer instructions that, when executed by a processor, implement the data distribution method as described in any one of claims 1 to 8.