Data scheduling method, distributed storage system, computing device, storage medium and program product
By employing multiple storage media types in a distributed storage system and scheduling data according to the characteristics and needs of the target object, the problem of low service availability caused by a single media type is solved, achieving a balance between performance and cost.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ALIBABA CLOUD COMPUTING CO LTD
- Filing Date
- 2024-12-27
- Publication Date
- 2026-06-30
Smart Images

Figure CN122309102A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of storage technology, and in particular to a data scheduling method, a distributed storage system, a computing device, a storage medium, and a program product. Background Technology
[0002] A distributed storage system is a storage system that distributes data across multiple independent data nodes. These data nodes are interconnected through a network and work together to present a unified pool of storage resources. Distributed storage systems typically use storage media as the physical carrier for storing data, aiming to preserve data long-term.
[0003] Currently, improving the service availability of distributed storage systems is a pressing technical problem that needs to be solved. Summary of the Invention
[0004] This application provides a data scheduling method, a distributed storage system, a computing device, a storage medium, and a program product to solve the technical problem of low service availability in the prior art.
[0005] In a first aspect, embodiments of this application provide a data scheduling method applied to a distributed storage system, the distributed storage system providing storage media of multiple media types; the distributed storage system includes multiple data nodes; each of the multiple data nodes deploys at least one type of storage media; the method includes:
[0006] In response to a data scheduling request, determine at least one piece of data to be scheduled corresponding to the target object;
[0007] From the multiple media types, at least two media types corresponding to the target object are determined;
[0008] Based on the at least two media types, determine the target storage medium corresponding to each of the at least one data to be scheduled and determine the target data node that provides the target storage medium;
[0009] The at least one data to be scheduled is stored in the target storage medium of its corresponding target data node.
[0010] Secondly, this application provides a distributed storage system that provides storage media of multiple media types; the distributed storage system includes a control node and multiple data nodes; each of the multiple data nodes deploys at least one type of storage media, and the distributed storage system includes:
[0011] A control node is configured to, in response to a data scheduling request, determine at least one data item to be scheduled corresponding to a target object; determine at least two media types corresponding to the target object from the multiple media types; determine the target storage medium corresponding to each of the at least one data item to be scheduled, and determine the target data node providing the target storage medium, according to the at least two media types.
[0012] Data nodes are used to store any allocated data to be scheduled into the corresponding target storage medium.
[0013] Thirdly, this application provides a data scheduling device applied to a distributed storage system, the distributed storage system providing storage media of multiple media types; the distributed storage system includes multiple data nodes; each of the multiple data nodes deploys at least one type of storage media; the device includes:
[0014] The data determination module is used to determine at least one piece of data to be scheduled corresponding to the target object in response to a data scheduling request.
[0015] The first determining module is used to determine at least two media types corresponding to the target object from the multiple media types;
[0016] The second determining module is used to determine the target storage medium corresponding to each of the at least two media types and to determine the target data node that provides the target storage medium, according to the at least two media types.
[0017] A storage module is used to store the at least one data to be scheduled into the target storage medium in the corresponding target data node.
[0018] Fourthly, this application provides a computing device, including a processing component and a storage component;
[0019] The storage component stores one or more computer instructions; the one or more computer instructions are invoked and executed by the processing component to implement the data scheduling method provided in the embodiments of this application.
[0020] Fifthly, this application provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processing component, implements the data scheduling method provided in this application.
[0021] Sixthly, this application provides a computer program product, including a computer program / instruction, which, when executed by a processing component, implements the data scheduling method provided in this application.
[0022] In the embodiments of this application, the distributed storage system provides storage media of multiple media types. For at least one data to be scheduled corresponding to a target object, at least two media types corresponding to the target object are first determined from the multiple media types. According to the at least two media types, the target storage media corresponding to each of the at least one data to be scheduled and the target data node providing the target storage media are determined. Then, the at least one data to be scheduled can be stored in the target storage media in the corresponding target data node. In the embodiments of this application, at least two media types are used to provide storage services for the target object, rather than using a single media type, thereby taking into account the storage advantages of different media types, improving the service availability of the distributed storage system, and improving the storage experience.
[0023] These or other aspects of this application will become more apparent in the following description of the embodiments. Attached Figure Description
[0024] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0025] Figure 1 A flowchart illustrating a data scheduling method provided in one embodiment of this application;
[0026] Figure 2 A block diagram of a distributed storage system provided in an embodiment of this application is shown;
[0027] Figure 3 A schematic diagram of a distributed storage system provided in one embodiment of this application is shown;
[0028] Figure 4 A block diagram of a data scheduling apparatus according to an embodiment of this application is shown;
[0029] Figure 5 A schematic diagram of the structure of a computing device provided in one embodiment of this application is shown. Detailed Implementation
[0030] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
[0031] In some of the processes described in the specification, claims, and accompanying drawings of this application, multiple operations appearing in a specific order are included. However, it should be clearly understood that these operations may not be executed in the order they appear herein, or may be executed in parallel. The operation numbers, such as 101, 102, etc., are merely used to distinguish different operations and do not themselves represent any execution order. Furthermore, these processes may include more or fewer operations, and these operations may be executed sequentially or in parallel. It should be noted that the descriptions such as "first," "second," etc., in this document are used to distinguish different messages, devices, modules, etc., and do not represent a chronological order, nor do they limit "first" and "second" to different types.
[0032] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation portals are provided for users to choose to authorize or refuse.
[0033] It should be noted that the technical solutions of this application embodiment are applicable to the network virtual environment, and the described users generally refer to "virtual users". Real users can register user accounts on the server through registration to obtain user identity in the network environment.
[0034] The technical solutions of this application can be applied to distributed storage scenarios. A distributed storage system is a storage system that distributes data across multiple data nodes. These data nodes are interconnected and work collaboratively through a network, presenting themselves as a unified storage resource pool under the coordination of a control node. Distributed storage systems typically use storage media as the physical carrier for storing data to preserve it long-term.
[0035] As described above, a distributed storage system can consist of a control node, also known as the Master node, and multiple data nodes. The control node acts as the "brain" of the distributed storage system. It is responsible for managing and coordinating the operation of the entire storage system, including tasks such as storage resource allocation, data scheduling, replica management, and system monitoring. For example, when new data needs to be stored, the control node determines which data nodes the data should be stored on based on a pre-defined data distribution strategy (such as hash distribution or replication strategy), ensuring that the data is distributed reasonably and efficiently throughout the storage system. Data nodes are the actual units for storing data in a distributed storage system. Each data node is equipped with storage media, such as HDDs (Hard Disk Drives) or SSDs (Solid-State Drives), to store data and provide data retrieval services when needed.
[0036] In distributed storage systems, data availability and fault tolerance are ensured by dividing large datasets (such as files) into multiple smaller chunks and replicating and distributing these chunks across different data nodes. Distributed storage systems typically employ data redundancy strategies such as multi-replica or erasure coding (EC). Multi-replica involves dividing the original data into multiple chunks and replicating each chunk multiple times, with each replica distributed across different data nodes. Erasure coding, on the other hand, divides the original data into multiple chunks and generates additional checksum blocks. These chunks and checksum blocks are then distributed across different data nodes, achieving data redundancy and recovery. In practical applications, distributed storage systems can provide storage services to service providers. When a distributed storage system serves as the underlying storage system, the service provider can refer to a storage service system built upon it, such as OSS (Object Storage Service) or EBS (Elastic Block Store). Alternatively, a distributed storage system can also serve as a user-facing storage system, where the service provider is the user.
[0037] In realizing the concept of this application, the inventors discovered that different types of storage media have different characteristics. For example, high-speed storage media such as SSDs typically have high storage performance and fast read / write speeds, enabling rapid response to data read and write requests. However, they are expensive, have low capacity, and are costly. Conversely, slow-speed storage media such as HDDs have lower storage performance than high-speed media, but correspondingly, they are cheaper, have higher capacity, and are less expensive. Currently, distributed storage systems typically provide only one type of storage media to service providers. This leads to high costs associated with using high-speed media and negatively impacts read / write performance when using slow-speed media, resulting in low service availability for distributed storage systems.
[0038] Therefore, improving the service availability of distributed storage systems has become an urgent technical problem to be solved.
[0039] In the embodiments of this application, the distributed storage system provides storage media of multiple media types, and uses at least two media types to provide storage services for the target object, rather than using a single media type. This can take into account the storage advantages of different media types, thereby improving the availability of the distributed storage system and enhancing the storage experience.
[0040] In one implementation, at least two media types can include at least a fast media type and a slow media type, thereby balancing storage performance and cost, and improving service availability without reducing performance or increasing cost.
[0041] Furthermore, fast media types can have higher storage priority than slow media types. Fast media types can be used first to provide storage services to service providers. When the storage resources corresponding to fast media types are insufficient, the service can be downgraded to slow media types, and the slow media types will then be used to provide storage services to service providers. Through adaptive degradation, service availability and resource utilization are further improved without reducing performance or increasing costs.
[0042] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0043] Figure 1This is a flowchart illustrating a data scheduling method according to an embodiment of this application. This data scheduling method can be applied to a distributed storage system, which provides storage media of various media types. The distributed storage system includes multiple data nodes, and each of the multiple data nodes deploys at least one type of storage media. The technical solution in this embodiment is executed by the control node in the distributed storage system, such as... Figure 1 As shown, the method may include the following steps:
[0044] 101: In response to a data scheduling request, determine at least one piece of data to be scheduled corresponding to the target object.
[0045] In the embodiments of this application, during the actual operation of the distributed storage system, data scheduling requests may be triggered by a variety of situations.
[0046] In one alternative embodiment of this application, the data scheduling request may be triggered by the service provider, so the method may further include: obtaining the data scheduling request sent by the service provider.
[0047] As mentioned above, in practical applications, the service provider of a distributed storage system can include a storage service system built on top of the distributed storage system, or a user. Therefore, data scheduling requests can be triggered when the storage service system or the user has a new data write requirement.
[0048] In another alternative embodiment of this application, the data scheduling request may also be generated when an anomaly is detected in any data corresponding to the target object, such as data corruption or degraded storage performance of the storage medium, which requires data rescheduling for data recovery or data migration.
[0049] In one alternative embodiment of this application, the target object can be a data block obtained by switching the original data, and the data to be scheduled can be implemented as a copy of the target object. At least one copy of the data to be scheduled can include multiple copies of the data, which are generated by copying the target object into multiple copies to achieve the above-mentioned multi-copy data redundancy strategy.
[0050] The aforementioned data scheduling request may be triggered by the service provider requesting to store multiple copies of the target object's data, or it may be generated when the control node detects an anomaly in any copy of the target object's data.
[0051] In another alternative embodiment of this application, the target object may refer to the original data, and at least one scheduling data may include data blocks and verification blocks divided into the target object to implement the data redundancy strategy of the error correction code described above.
[0052] 102: From multiple media types, determine at least two media types corresponding to the target object.
[0053] In embodiments of this application, the different types of storage media provided by the distributed storage system can be categorized according to different dimensions. For example, in one possible implementation, the media types can be categorized according to storage performance into fast media, medium-speed media, and slow media; or according to storage capacity into large-capacity media, medium-capacity media, and small-capacity media; or according to cost into high-cost media, medium-cost media, and low-cost media; or according to working principle into SSD type, HDD type, etc. The same storage medium can be classified into different media types. For example, SSDs have faster read and write speeds and belong to the fast media type; their common capacities range from 128GB to several TB, belonging to the medium-capacity media type. Solid-state drives have relatively high unit storage costs and belong to the high-cost media type.
[0054] In practical applications, different media types can be classified by selecting one or more classification dimensions according to different application requirements.
[0055] For ease of understanding, the following one or more embodiments primarily use the classification of media types according to storage performance dimensions to explain the specific implementation methods of the embodiments of this application. Storage performance may include performance indicators such as read / write speed or read / write latency.
[0056] In the embodiments of this application, at least two media types corresponding to the target object can be determined from a variety of media types based on the storage method preset by the service provider. For example, determining at least two media types corresponding to the target object can specifically be implemented by: determining the service provider corresponding to the target object, and then determining the storage method pre-configured by the service provider, which may include at least two media types. Thus, the at least two media types configured in the storage method can be determined as the at least two media types corresponding to the target object.
[0057] In another possible implementation of this application, at least two media types corresponding to the target object can be determined based on the data characteristics of the target object. These data characteristics may include, for example, access frequency, data importance, and data size.
[0058] For example, regarding access frequency, among at least one set of data to be scheduled for a target object, frequently accessed data, such as real-time order information from e-commerce platforms or user activity information from popular social media platforms, is considered "hot data" and requires fast read / write responses. For hot data, a fast media type (such as SSD) can be chosen. For infrequently accessed data, such as backup files, historical records, or rarely accessed archives—"cold data"—relatively slower but high-capacity media types such as HDD or magnetic tape can be used.
[0059] In another possible implementation of this application, at least two media types corresponding to the target object can be determined based on the storage resources of the distributed storage system. For example, if the storage space of SSDs in the storage system is limited, while HDD resources are sufficient, for a new target object, HDD resources may be utilized as rationally as possible based on the data characteristics, while a small number of SSDs may be used for data storage of critical parts. For example, in a small data center with limited storage resources, for newly uploaded user files, a small portion of critical data such as the file index may be stored on limited SSDs, while the file body may be stored on HDDs.
[0060] In a practical application, in order to balance storage performance and cost, the at least two media types may include a fast media type and a slow media type.
[0061] 103: Based on at least two media types, determine the target storage medium corresponding to each of the data to be scheduled and the target data node that provides the target storage medium.
[0062] 104: Store at least one data to be scheduled in the target storage medium of its corresponding target data node.
[0063] In embodiments of this application, the control node can record and manage the media types and distribution of all storage media in the distributed storage system. The control node can maintain a detailed record, including the media type of each storage medium in the distributed storage system, such as SSDs (fast media) and HDDs (slow media), as well as their deployment on each data node. For example, the control node records that data node A is equipped with both SSDs and HDDs, while data node B only has HDDs.
[0064] When a new target object needs to be stored, and at least two media types have been identified, the control node can make a series of scheduling decisions based on its recorded information. For example, as mentioned above, the control node can determine at least two media types that match the target object based on its data characteristics, such as access frequency, data importance, security requirements, data size, or pre-configured storage method.
[0065] Next, the control node can determine the target media type corresponding to each piece of data to be scheduled from at least two media types, and determine the target data node that provides the target storage medium of that target media type. Then, the control node can allocate the data to be scheduled to the target storage medium of that target data node for storage. The target storage medium can be any storage medium of the target media type, and the target data node can be any data node that provides the target storage medium. For example, if some of the data to be scheduled for a target object corresponds to an SSD, the control node can search all data nodes equipped with SSD-type storage media, and then randomly select target data node A. The control node can then allocate the data to be scheduled to the SSD-type target storage medium on target data node A for storage.
[0066] Therefore, the control node can determine at least one target data node for at least one scheduled data of a target object, and this target data node provides a target storage medium of the corresponding target medium type. In this way, the target object can be effectively allocated to a storage medium suitable for its characteristics, meeting different data storage needs, while ensuring high data availability and efficient operation of the distributed storage system.
[0067] After determining the target storage medium and the target data node that provides the target storage medium, the control node can store at least one data to be scheduled into the target storage medium in its corresponding target data node.
[0068] In the embodiments of this application, the distributed storage system provides storage media of multiple media types. For at least one data to be scheduled corresponding to a target object, at least two media types corresponding to the target object are first determined from the multiple media types. According to the at least two media types, the target storage media corresponding to each of the at least one data to be scheduled and the target data node providing the target storage media are determined. Then, the at least one data to be scheduled can be stored in the target storage media in the corresponding target data node. In the embodiments of this application, at least two media types are used to provide storage services for the target object, rather than using a single media type, thereby improving resource utilization and improving the availability of the distributed storage system.
[0069] In the process of developing this application, the inventors discovered that traditional methods, which use only one type of storage medium for data scheduling (e.g., using SSDs to ensure storage performance), require reserving a significant amount of SSD resources in the distributed storage system to prevent SSDs from being overwritten in abnormal user scenarios, leading to substantial cost pressures. Furthermore, traditional methods trigger alarms when SSD resource utilization exceeds 80% to ensure service availability. However, due to the high write speed of SSDs, even with 20% of resources reserved, if maintenance cannot quickly identify and fix the problem, SSD resources may become full, posing a risk of service interruption. Therefore, to further improve service availability, some embodiments determine the target storage medium for at least one data item to be scheduled, based on at least two media types. This can be implemented as follows: for any data item to be scheduled, in descending order of storage priority, sequentially determine whether the remaining storage resources of any media type corresponding to the target object meet the storage conditions until the target media type with sufficient remaining storage resources is obtained; determine at least one target storage medium belonging to the target media type corresponding to the data item to be scheduled.
[0070] Storage priority can be preset based on one or more factors such as the performance characteristics and cost of different types of storage media.
[0071] For example, based on read / write speed, SSDs can be given higher priority due to their fast read / write capabilities; HDDs, with their relatively slower read / write speeds, may have a lower priority than SSDs; and tape, with its even slower read / write speeds compared to HDDs, generally has an even lower priority. However, in scenarios where cost is a major concern or where large-capacity storage is prioritized, the priority may differ. For instance, for long-term archived data, tape, with its advantages of large capacity and low cost, may have a higher priority in such scenarios.
[0072] In one optional embodiment of this application, the storage priority can be configured by the server. When the storage method configured by the server includes at least one media type, it can also include the storage priority corresponding to each of the at least one media type. Therefore, for any data to be scheduled, the control node can sequentially determine whether the remaining storage resources of any media type corresponding to the target object meet the storage conditions based on the storage priority pre-configured by the server.
[0073] In another alternative embodiment of this application, the storage priority can also be preset by the control node based on one or more factors such as the performance characteristics and cost of different media types of storage media.
[0074] Furthermore, storage priorities can be adjusted based on the specific data requirements of the usage scenario. For example, for real-time transaction data from e-commerce platforms that requires frequent read / write operations and has extremely high response speed requirements, SSDs can be set as a higher priority media type from both performance and importance perspectives. On the other hand, for some historical order data, although the read / write frequency is not high, it is occasionally needed for querying. In this case, HDDs may become a relatively suitable priority choice, ranking after SSDs.
[0075] The remaining storage resources for any media type corresponding to the target object can refer to the remaining storage resources requested by the service provider for that media type. For example, the service provider can request 30TB (Terabyte) of HDD storage resources and 20TB of SSD storage resources, etc.
[0076] Storage conditions could be, for example, that the remaining storage resources are greater than the size of the data to be scheduled, or that the remaining storage resources are greater than the size of the data to be scheduled and the difference between the remaining storage resources and the size of the data to be scheduled is greater than a certain threshold, so as to ensure that the remaining storage resources are sufficient to write the data to be scheduled.
[0077] This embodiment's technical solution allows data to be scheduled to a high-priority storage medium when resources of that medium type are sufficient. If the remaining storage resources of the high-priority medium type are insufficient, adaptive storage degradation can be implemented, further improving storage service availability and preventing storage failures. For example, at least two medium types are included, a first medium type and a second medium type. The first medium type has a higher storage priority than the second medium type. Therefore, the data to be scheduled for the target object can be preferentially scheduled to the first medium type storage medium. If the remaining storage resources of the first medium type storage medium for the target object are insufficient, it can be degraded to the second medium type storage medium.
[0078] After determining the target media type (e.g., HDD), the specific target storage medium can be further determined. In a distributed storage system, multiple data nodes may have HDDs deployed. In this case, the control node can comprehensively consider factors such as the network connectivity and load balancing of the data nodes to select the specific target storage medium. For example, both data node A and data node B have available HDDs, but data node A has a more stable network connection to the data source and a lower current load. Therefore, the control node can determine the HDD on data node A as the target storage medium to store the corresponding data to be scheduled.
[0079] In some embodiments, as described above, the at least two media types corresponding to the target object may include at least a fast media type and a slow media type.
[0080] Among these, the storage priority of fast media types can be higher than that of slow media types. For any data to be scheduled, the remaining storage resources of each media type corresponding to the target object are sequentially determined according to the storage priority from high to low to see if they meet the storage conditions, until the target media type whose remaining storage resources meet the storage conditions is found. This process can include:
[0081] For any data to be scheduled, determine whether the remaining storage resources of the fast media type corresponding to the target object meet the storage conditions; if yes, determine that the target media type corresponding to the data to be scheduled is a fast media type; if no, determine that the target media type corresponding to the data to be scheduled is a slow media type.
[0082] By initially providing storage services to the service provider using fast media, and then downgrading to slow media when the storage resources corresponding to fast media are insufficient, the slow media will then continue to provide storage services to the service provider. This approach balances storage performance and cost, thereby improving service availability without reducing performance or increasing costs.
[0083] In some embodiments, at least one data to be scheduled can be multiple copies of the target object.
[0084] In this context, replica data refers to data obtained by copying the target object. In distributed storage systems, multiple replicas of the target object can be created to improve data reliability, availability, redundancy, and to cope with potential failures and disaster recovery.
[0085] The above-mentioned determination of at least two media types corresponding to the target object from multiple media types can be specifically implemented as follows: determining the storage method corresponding to the target object from multiple media types; the storage method may include at least two media types and the number of copies corresponding to each of the at least two media types.
[0086] The above-mentioned determination of at least one target storage medium corresponding to each of the data to be scheduled and the target data node providing the target storage medium according to at least two media types can be specifically implemented as follows: determining the target storage medium corresponding to multiple replica data and the target data node providing the target storage medium according to at least two media types and the number of replicas corresponding to at least two media types respectively.
[0087] By distributing multiple copies of data across target storage media of at least two media types, the advantages of multiple media types can be combined, thereby improving service availability in the case of replica storage.
[0088] Considering the characteristics of different storage media types, multiple copies of a target object's data can be rationally allocated based on media type. For example, the at least two media types can include fast media types and slow media types. During the development of this application, the inventors discovered that in traditional methods, the data to be scheduled is either entirely written to either fast or slow media types, resulting in low service availability of the distributed storage system. To improve service availability, this embodiment utilizes at least two media types to store the target object's data to be scheduled. This is applicable to application scenarios with different read / write performance requirements. Some copies can be stored on SSDs, leveraging their fast read / write performance to meet frequent data access needs, while other copies are stored on relatively slow but high-capacity media such as HDDs or magnetic tapes for data backup and long-term archiving.
[0089] For example, in a practical application, a service provider can pre-configure two media types for target objects that are insensitive to write latency but sensitive to read latency: SSD and HDD. They can also allocate the number of replicas for each media type. For instance, for a target object requiring three copies of data, the service provider can configure one replica on the SSD and two replicas on the HDD. When receiving a data scheduling request from the service provider for three copies of the target object, one copy can be placed on the SSD, and the other two on the HDD. When reading data from the target object, the corresponding copy can be retrieved from the SSD to ensure read speed. Compared to storing all three copies on SSDs, this 1SSD, 2HDD configuration offers lower storage costs (only one copy is stored on the SSD) and near-HDD write latency, while maintaining SSD read performance. This achieves higher read performance while reducing storage costs.
[0090] The storage method indicates the media type and the corresponding number of copies, which can be flexibly selected according to actual application needs.
[0091] In the embodiments of this application, the allocation of the number of replicas across different media can also be determined in conjunction with the resource status of the storage system. For example, if SSD resources are limited in the distributed storage system, or the storage resources of the SSDs requested by the service provider are small, the number of replicas allocated to the SSDs can be appropriately reduced while increasing the number of replicas on other media such as HDDs, provided that certain performance is guaranteed.
[0092] In some embodiments, multiple data nodes can be distributed across multiple racks, forming a rack set. In large computer network environments such as data centers or computing clusters, physical machines and other equipment are typically installed in standard-sized racks. A rack can be a metal frame structure, similar to a multi-tiered bookshelf, where physical machines can be placed like books. In the embodiments of this application, data nodes are also physical machines, and the multiple data nodes included in the distributed storage system are distributed across multiple racks.
[0093] The above-mentioned determination of at least one target storage medium corresponding to each of the data to be scheduled and the target data node providing the target storage medium, based on at least two media types, can be specifically implemented as follows: determining the number of storage constraints in any rack according to the security requirements corresponding to the target object; and determining the target storage medium corresponding to each of the data to be scheduled, the target data node providing the target storage medium, and the target rack where the target data node is located, based on at least two media types and the number of storage constraints.
[0094] In distributed storage systems, different target objects have different security requirements due to their importance, application scenarios, and other factors. These security requirements are often related to fault tolerance. For example, a security requirement might be to ensure that even if a rack fails (such as a power outage, network failure, or hardware damage), the data remains intact and available without affecting the normal operation of the system.
[0095] Based on the security requirements of the target object, the control node can determine the number of storage constraints in each rack accordingly. The number of storage constraints is used to limit the maximum amount of scheduled data belonging to the target object that can be allocated in each rack. These security requirements may include, for example, Rack_Domain or Rack-Machine_Domain. These security requirements can be configured by the service provider, and therefore, the aforementioned preset storage methods may include such security requirements.
[0096] Assuming the target object corresponds to 3 copies of data, then:
[0097] A Rack Domain can refer to a single rack where data is not lost. Assuming the Rack Domain security requirements are met, and taking a target object with three data replicas as an example, the Rack Domain requires each rack to hold a maximum of two replicas. Therefore, the storage constraint is two. If a rack fails, at most two replicas will become unreadable, leaving one readable replica for the target object, thus ensuring the Rack Domain's security requirements.
[0098] Rack-Machine_Domain refers to a scenario where data is hosted on one machine in one rack and another in another rack, ensuring no data loss. It offers higher security than Rack_Domain. Taking the example of a target object with three replicas, to implement Rack-Machine_Domain, each rack must have a maximum of one replica; in this case, the storage constraint is also one. Data security requirements are related not only to data distribution but also to the data replication strategy. For example, with two replicas, only Rack_Domain can be implemented, not Rack-Machine_Domain.
[0099] It should be noted that the above are merely examples illustrating possible ways to implement security requirements. In practical applications, in addition to considering racks and data nodes, the area where the racks are located can also be considered to determine the number of storage constraints in each rack. This application does not impose any limitations on this. Of course, in practical applications, the number of storage constraints can also be preset.
[0100] After determining the number of storage constraints, at least two media types and the number of storage constraints can be used to determine the target storage medium corresponding to each of the data to be scheduled, the target data node providing the target storage medium, and the target rack where the target data node is located.
[0101] The control node can comprehensively consider factors such as the rack conditions of each data node, the node's own resource status (such as the remaining storage capacity of the storage medium, current read / write load, etc.), and network connectivity to select target data nodes and target racks. For example, if data node A is located in rack 1, is equipped with an SSD that meets the storage requirements, and the SSD has sufficient remaining storage capacity, and the number of target object replicas currently stored in rack 1 has not reached the storage constraint, then data node A can be selected as the target data node, and correspondingly, rack 1 is the target rack. If rack 1 has already reached the storage constraint, it is necessary to examine the data nodes on other racks and select suitable ones as target data nodes to ensure that the data to be scheduled is stored reasonably while meeting security requirements and media type allocation.
[0102] In some embodiments, determining the target storage medium corresponding to each of the at least two media types can be specifically implemented as follows:
[0103] A target rack is randomly selected from the rack set. For any data to be scheduled, a target data node providing the target storage medium is determined from the target rack to allocate the data to the target rack. If the number of data to be scheduled for a target object allocated to the target rack reaches the storage constraint number, the selected target rack is deleted from the plurality of racks to update the rack set, and the step of randomly selecting a target rack from the rack set is returned to continue execution until the allocation of at least one data to be scheduled is completed. After the allocation of at least one data to be scheduled is completed, the rack set can be restored to the plurality of racks. Thus, at least one target rack corresponding to at least one data to be scheduled can be determined. It should be noted that deleting the selected target rack from the rack set here is an algorithmic operation, not an actual removal of the target rack; it only indicates that the selected target rack will no longer participate in the allocation of subsequent unallocated data to be scheduled.
[0104] Optionally, the multiple racks in the rack set may each have a corresponding weight coefficient, and a target rack may be selected from the rack set based on the weight coefficients corresponding to the multiple racks.
[0105] The initial value of the weighting coefficient can be determined based on the number of storage media deployed in the rack. The initial value can be the number of storage media, or it can be a multiple of the number of storage media.
[0106] In embodiments of this application, a target rack may be selected from the rack set according to the initial value of the weighting coefficient.
[0107] After selecting a target rack, for an unallocated data to be scheduled, a target data node providing the target storage medium can be determined from the target rack to allocate the data to be scheduled.
[0108] The target data node can be any data node in the target rack that provides the target storage medium; or it can be a target data node that can provide the target storage medium that meets the storage requirements, determined by strategies such as load balancing.
[0109] In the embodiments of this application, the target rack can be selected randomly according to its weight ratio. For example, racks with higher weight coefficients are more likely to be selected. Through this probabilistic selection mechanism, the data to be scheduled is more likely to be allocated to racks with stronger storage capacity, while also retaining a certain degree of randomness to avoid allocating the data to be scheduled to a few racks with high weights, thus ensuring that the data has a relatively reasonable distribution possibility among multiple racks.
[0110] Selecting a target rack from a set of racks can be achieved, for example, in the following manner:
[0111] First, calculate the sum of weight coefficients S for all racks; then, generate a random number N in the range [1, S]; then, scan the rack set, and if the random number N is greater than the weight coefficient of a rack, then that rack can be used as the target rack.
[0112] If the number of data to be scheduled for a target object allocated to a target rack reaches the storage constraint limit, the selected target rack can be deleted from the rack set to update the rack set. Then, the operation of selecting a target rack from the rack set can continue to be performed to allocate the next data to be scheduled.
[0113] In the process of developing this application, the inventors discovered that after determining the target rack, if a data node is randomly selected from the target rack as the target data node, due to the heterogeneity of the racks, the number of data nodes in each rack may be different. Therefore, if the initial value of the weight coefficient is selected from the target rack, it may increase the probability of selecting a certain data node, thus failing to guarantee the data balance among the data nodes.
[0114] For example, suppose a distributed storage system includes 5 data nodes A, B, C, D, and E, distributed across 3 racks: rack_1 (A, B), rack_2 (C, D), and rack_3 (E). Without rack constraints, the selection probability for each data node is 2 / 5. However, considering rack constraints, the expected selection probability for data node E is greater than 2 / 5, while the expected selection probabilities for data nodes A, B, C, and D are less than 2 / 5. This results in the smaller rack's corresponding machine (machine E in rack_3) being allocated a larger number of chunks, leading to hotspots, impacting service availability, or causing low overall cluster utilization and increased storage costs.
[0115] Therefore, in order to ensure data balance, in some embodiments, the method may further include:
[0116] The weight coefficient corresponding to the target rack is attenuated based on the number of data to be scheduled for the target object allocated to the target rack.
[0117] By attenuating the weight coefficient of the target rack, the probability of that rack being selected subsequently can be reduced, thereby reducing the probability of data nodes within that rack being selected. This ensures data balance among data nodes. It prevents a rack from being overloaded with data storage tasks due to a high initial weight coefficient or being selected multiple times in the early stages, leading to increasingly uneven data distribution. For example, if a rack is allocated a large amount of data in a data scheduling session, its weight coefficient, after attenuation, will have a lower probability of being selected in subsequent data scheduling sessions. This guides more data to be scheduled to other less fully utilized racks, ensuring a relatively balanced data distribution throughout the system and improving the overall performance and resource utilization of the distributed storage system.
[0118] In one alternative embodiment of this application, the attenuation of the weight coefficient of the target rack based on the number of data to be scheduled for the target object allocated to the target rack can be specifically implemented as follows: determining the weight coefficient attenuation value corresponding to the number of data to be scheduled allocated to the target rack; and using the weight attenuation value to attenuate the weight coefficient corresponding to the target rack.
[0119] In the embodiments of this application, a target value corresponding to the allocation of a single piece of data can be preset. This target value can be 1. The weight coefficient decay value can be the product of the number of allocated data to be scheduled and the target value. When the target value is 1, that is, when the weight coefficient decays by 1 for each piece of data to be scheduled allocated to the rack, the weight coefficient decay value can be the number of data to be scheduled. For example, if the current weight coefficient of the target rack is 10 and the number of allocated data to be scheduled is 2, then the weight coefficient decay value is 2. When it is necessary to decay the weight coefficient of the target rack, the decayed weight coefficient can be: 10 - 2 = 8.
[0120] In another optional embodiment of this application, the attenuation of the weight coefficient of the target rack based on the number of data to be scheduled for the target object allocated to the target rack can be specifically implemented as follows: determining the weight coefficient attenuation ratio corresponding to the number of data to be scheduled for the target object allocated to the target rack; and attenuating the weight coefficient corresponding to the target rack using the weight coefficient attenuation ratio.
[0121] In the embodiments of this application, a target ratio corresponding to the allocation of a single piece of data can be preset, such as 10%. The weight coefficient decay ratio can be the product of the number of allocated data to be scheduled and the target ratio. For example, if the current weight coefficient of the target rack is 10, the target ratio is assumed to be 10%, and the number of data to be scheduled for the allocated target object is 2, then the weight coefficient decay ratio is 20%, and the decayed weight coefficient can be: 10 × (1 - 20%) = 8.
[0122] Of course, the above-mentioned attenuation of the weight coefficients corresponding to the at least one target rack based on the number of data to be scheduled for the target objects allocated to the at least one target rack can also be achieved by attenuating the weight coefficients corresponding to the target rack by a target value or target proportion whenever a data to be scheduled is allocated to each target rack.
[0123] In embodiments of this application, the method may further include:
[0124] Based on the total amount of data allocated to the multiple racks, calculate the allocation values corresponding to the multiple racks; calculate the weight sum value corresponding to the initial value of the weight coefficients corresponding to the multiple racks; when the ratio of the allocation value to the weight sum value reaches a predetermined value, update the weight coefficients corresponding to the multiple racks to the initial value.
[0125] The total data volume can refer to the total number of data to be scheduled corresponding to all objects that have been allocated to multiple racks, while the allocation value can be a multiple of the total data volume. Optionally, the multiple can be 1, and the allocation value can be the same as the total data volume. For example, after each data to be scheduled is allocated to a rack, the allocation value can be incremented by 1.
[0126] Following the example above, suppose the distributed storage system includes 5 data nodes: A, B, C, D, and E, distributed across 3 racks: rack_1 (A, B), rack_2 (C, D), and rack_3 (E). Each data node has 76 storage media. The initial weight coefficient for rack_1 is (76 + 76) * 2 = 304; the initial weight coefficient for rack_2 is (76 + 76) * 2 = 304; and the initial weight coefficient for rack_3 is 76 * 2 = 152. Therefore, the total weight is 304 + 304 + 152 = 760. For each piece of data allocated to a rack, the weight is incremented by 2. Assuming a predetermined weight of 0.95, if the total weight is 722, the weight is 0.95, which is greater than or equal to the predetermined weight. In this case, the weight coefficients can be updated to their initial values.
[0127] In the embodiments of this application, by updating the weight coefficients of multiple racks to their initial values, a periodic and dynamically balanced weight adjustment mechanism can be implemented to ensure that data scheduling proceeds normally and to guarantee the stable and efficient operation of the entire storage system.
[0128] In some embodiments, in error correction code scenarios, at least one data to be scheduled may include a data block of the target object and a check block.
[0129] The above-mentioned determination of at least two media types corresponding to the target object from multiple media types can be specifically implemented as follows: determining the storage method corresponding to the target object from multiple media types; the storage method includes at least two media types and the storage priorities corresponding to at least two media types respectively.
[0130] The determination of the target storage medium corresponding to each of the at least two media types, as described above, may include:
[0131] Based on at least two media types, a first number of data blocks are identified as target storage media of a first media type, a second number of data blocks are identified as target storage media of a second media type, and a plurality of parity blocks are identified as target storage media of a third media type. The storage priority of the first media type is higher than that of the second and third media types. The storage priority can be determined by combining storage performance and / or storage cost. The first number of data blocks differs from the second number of data blocks; the first number of data blocks and the second number of data blocks constitute the plurality of data blocks.
[0132] As described above, the service provider can configure the storage priority corresponding to each media type. In the embodiments of this application, the control node can first determine the first media type according to the storage priority from high to low. A first number of data blocks can be stored in the storage medium of the first media type. Then, the second media type is determined, and a second number of data blocks can be stored in the storage medium of the second media type.
[0133] For check blocks, since they are primarily used for data integrity verification, the read / write speed requirements are typically not as high as for data blocks. Therefore, they can be stored on a third media type with a relatively lower priority. Optionally, the third media type can be the same as the second media type, or it can be a different media type than the second media type, such as magnetic tape. The control node can assign multiple check blocks to target storage media of the third media type.
[0134] In this way, the control node can rationally allocate multiple data blocks and multiple check blocks of the target object to the target storage media of different storage media types according to the storage priority determined by storage performance and / or storage cost, so as to achieve efficient, economical and reliable storage of data in the distributed storage system, and fully take into account the needs of performance, cost and data integrity guarantee in the storage process.
[0135] In some embodiments, the method may further include: obtaining a data scheduling request sent by the service provider.
[0136] The above-mentioned storage of at least one data to be scheduled into its corresponding target storage medium in the target data node can be specifically implemented as follows: the service provider stores at least one data to be scheduled into its corresponding target storage medium in the target data node. Optionally, the service provider may be provided with the target storage medium and target data node corresponding to at least one data to be scheduled, so that the service provider can store at least one data to be scheduled into its corresponding target storage medium in the target data node.
[0137] In the embodiments of this application, after receiving a data scheduling request from the service provider, the control node can determine the target storage medium corresponding to each piece of data to be scheduled and the target data node providing the target storage medium. Then, the control node can feed back the storage location information of each piece of data to be scheduled, i.e., the target storage medium and the target data node providing the target storage medium, to the service provider. For example, after the control node determines that the target storage medium for a certain piece of data to be scheduled is an SSD on data node A, it can send location information such as "data to be scheduled X, target storage medium is an SSD on data node A" to the service provider.
[0138] In one possible implementation of this application, the location information can be fed back in various forms. For example, it can be returned as structured data through an application programming interface (API), such as an array in JSON format containing information about each data to be scheduled, its corresponding target storage medium, and the target data node; or it can be transmitted as a message queue, from which the service provider can obtain the corresponding feedback content.
[0139] After receiving location information from the control node, the service provider can initiate its own data transmission and storage processes based on that information. The service provider can utilize its network connectivity to send the data to be scheduled to the corresponding target data node. For example, if the service provider is a cloud storage client application, it can establish a network connection with the target data node (which may be located within the cloud storage data center) according to the feedback information (e.g., using HTTP, HTTPS, or other protocols over the internet), and then upload the local data to be scheduled to the specified target storage medium on the target data node.
[0140] In some embodiments, the method further includes: obtaining a data acquisition request for the target object; determining a target storage medium with a high read priority corresponding to the target object according to the read priorities corresponding to at least two media types; and reading the data corresponding to the target object from the target storage medium.
[0141] In the embodiments of this application, the control node can determine the target storage medium with high read priority based on pre-defined read priority rules for different media types. The read priority can be the same as the storage priority mentioned above, but is not limited to this. The read priority can also be set according to read performance requirements; for example, for target objects sensitive to read latency, a higher priority can be set for fast media types. By using read priority, read performance can be guaranteed. Optionally, in the event of a failure of the target storage medium with high read priority, the data corresponding to the target object can be read from the target storage medium with the next higher read priority to further ensure service availability.
[0142] In some embodiments, the method may further include providing the service provider with multiple media types.
[0143] The above-mentioned determination of at least two media types corresponding to the target object from multiple media types can be specifically implemented as: obtaining at least two media types selected by the service provider from multiple media types.
[0144] In the embodiments of this application, the control node provides the service provider with information on various media types, enabling the service provider to fully understand the various storage media options available for storing data. This allows the service provider to make more appropriate storage decisions based on its own storage needs, data characteristics, and cost considerations.
[0145] In addition, the service provider can configure the storage priority and read priority of different media types, as well as the amount of data to be scheduled stored in each media type.
[0146] By configuring storage priorities, read priorities, and the amount of data to be scheduled stored on each media type, the service provider can generate a storage method for the data to be scheduled for a target object. The storage method can indicate which media type should be stored for different types of data, at what priority should they be stored and retrieved, and the amount of data stored on each media type. Through the embodiments of this application, the service provider can independently select the media type according to its own needs, improving service flexibility while ensuring service availability.
[0147] In some embodiments, the method may further include: generating a data scheduling request if any copy of the target is detected to be abnormal.
[0148] Specifically, determining at least one data to be scheduled corresponding to the target object can be achieved by using a copy of the target object as the data to be scheduled.
[0149] In embodiments of this application, when any copy data anomaly is detected, the control node needs to take timely measures to repair and adjust it in order to ensure data integrity, redundancy, and availability. In this case, a data scheduling request can be generated. The data scheduling request is used to request the replacement or repair of the abnormal copy data by scheduling other normal copy data or regenerating a copy, so that the number of copies and data content of the target object on various storage media and data nodes can be restored to a normal state, ensuring that the data can still be reliably accessed and used. For example, in a distributed storage system storing important enterprise document data, if an anomaly is detected in the copy data on a certain data node, after generating a data scheduling request, the system can schedule normal copy data on other data nodes to cover the abnormal copy, or regenerate a copy based on the original data and store it in a suitable location to maintain the redundant backup state of the data.
[0150] Figure 2 This diagram illustrates a block diagram of a distributed storage system according to an embodiment of this application. The distributed storage system provides storage media of multiple media types; the distributed storage system includes a control node and multiple data nodes; each of the multiple data nodes deploys at least one media type of storage media. The distributed storage system includes:
[0151] Control node 201 is used to respond to a data scheduling request to determine at least one data to be scheduled corresponding to a target object; determine at least two media types corresponding to the target object from multiple media types; and determine the target storage medium corresponding to each of the at least two data to be scheduled and the target data node 202 that provides the target storage medium according to the at least two media types.
[0152] Data node 202 is used to store any allocated data to be scheduled into the target storage medium.
[0153] In practical applications, distributed storage systems can provide storage services to service providers. When a distributed storage system is used as the underlying storage system, the service provider can refer to a storage service system built on top of the distributed storage system, such as OSS (Object Storage Service) or EBS (Elastic Block Store). Of course, a distributed storage system can also be used as a user-facing storage system, in which case the service provider is the user.
[0154] The specific implementation methods for the control node and data node can be found in [reference]. Figure 1 The data scheduling method shown will not be described in detail here.
[0155] To facilitate understanding, the following will be combined with... Figure 3 The following is a scene interaction diagram to introduce the technical solution of the embodiments of this application.
[0156] like Figure 3 As shown, service provider 301 can send data scheduling requests to distributed storage system 302, and distributed storage system 302 can receive data scheduling requests using control node 201.
[0157] In response to a data scheduling request, control node 201 can determine at least one piece of data to be scheduled corresponding to the target object. In this example, the target object may correspond to four pieces of data to be scheduled, and these four pieces of data to be scheduled may be replicas of the target object.
[0158] After determining the four copies of the target object's data, control node 201 can select at least two media types from the various media types provided by distributed storage system 301 to store the four copies. In this example, control node 201 can select at least two media types for the four copies based on a pre-defined storage method. For example, the storage method can indicate both SSD and HDD media types, with SSD storing three copies and HDD storing one copy.
[0159] Based on this, control node 201 can determine the target storage medium corresponding to each of the four data items to be scheduled, as well as the target data node providing the target storage medium, according to the two media types mentioned above. In this example, control node 201 can, for example, determine data nodes 3021, 3022, and 3023 as target data nodes. Specifically, SSD type storage media 3024 and 3025 on data node 3021 can be target storage media, used to store replica data 1 and replica data 2, respectively; SSD type storage media 3026 on data node 3022 can be target storage media, used to store replica data 3; and HDD type storage media 3027 on data node 3023 can be target storage media, used to store replica data 4.
[0160] Of course, each copy of data can also be assigned to a storage medium of the corresponding SSD type if the remaining storage resources of the SSD meet the storage conditions. If the storage conditions are not met, it can be downgraded to a storage medium of the HDD type.
[0161] Once the target storage medium and the target data node providing the target storage medium are determined, the control node can feed back the storage information to the service provider 301, and the service provider 302 can store at least one scheduled data of the target object in the target storage medium deployed on the target data node.
[0162] Figure 4 This illustration shows a block diagram of a data scheduling apparatus according to an embodiment of the present application. The data scheduling apparatus can be applied to a distributed storage system that provides storage media of various media types. The distributed storage system includes multiple data nodes; each of the multiple data nodes deploys at least one type of storage media. The apparatus includes:
[0163] The data determination module 401 is used to determine at least one data to be scheduled corresponding to the target object in response to a data scheduling request.
[0164] The first determining module 402 is used to determine at least two media types corresponding to the target object from multiple media types;
[0165] The second determining module 403 is used to determine the target storage medium corresponding to at least one data to be scheduled and the target data node providing the target storage medium according to at least two media types.
[0166] Storage module 404 is used to store at least one data to be scheduled into the target storage medium in the corresponding target data node.
[0167] In some embodiments, the second determining module 403 is specifically used to: for any data to be scheduled, determine in descending order of storage priority whether the remaining storage resources of any media type corresponding to the target object meet the storage conditions, until the target media type whose remaining storage resources meet the storage conditions is obtained; and determine at least one target storage medium corresponding to the target media type that is the target media type for any data to be scheduled.
[0168] In some embodiments, at least one data to be scheduled is multiple copies of the target object.
[0169] In some embodiments, the first determining module 402 is specifically used to: determine the storage method corresponding to the target object from multiple media types; the storage method includes at least two media types and the number of copies corresponding to at least two media types respectively.
[0170] In some embodiments, the second determining module 403 is specifically used for:
[0171] Based on at least two media types and the number of copies corresponding to each of the at least two media types, determine the target storage media corresponding to the multiple copies of data and the target data nodes that provide the target storage media.
[0172] In some embodiments, at least one data to be scheduled is a plurality of data blocks and a plurality of check blocks divided from the target object.
[0173] In some embodiments, the first determining module 402 is specifically used to: determine the storage method corresponding to the target object from multiple media types; the storage method includes at least two media types and storage priorities corresponding to the at least two media types respectively; according to the at least two media types, determine the target storage medium of the first media type corresponding to the first number of data blocks, the target storage medium of the second media type corresponding to the second number of data blocks respectively, and the target storage medium of the third media type corresponding to the multiple check blocks respectively; the storage priority of the first media type is higher than that of the second media type and the third media type; wherein, the storage priority is determined in combination with storage performance and / or storage cost.
[0174] In some embodiments, multiple data nodes are distributed across multiple racks.
[0175] In some embodiments, the second determining module 403 may specifically be used to: determine the number of storage constraints in any rack according to the security requirements corresponding to the target object; and determine the target storage medium corresponding to at least one data to be scheduled, the target data node providing the target storage medium, and the target rack where the target data node is located, according to at least two media types and the number of storage constraints.
[0176] In some embodiments, the device may further include:
[0177] The first request acquisition module is used to acquire data scheduling requests sent by the service provider;
[0178] In some embodiments, the storage module 404 is specifically used to: store at least one data to be scheduled into the target storage medium in the respective target data node through the service provider.
[0179] In some embodiments, the device may further include:
[0180] The second request acquisition module is used to acquire data acquisition requests for the target object;
[0181] The media determination module is used to determine the target storage medium with high read priority corresponding to the target object according to the read priorities corresponding to at least two media types.
[0182] The data reading module is used to read the data corresponding to the target object from the target storage medium.
[0183] In some embodiments, the device may further include:
[0184] The media provision module is used to provide various media types to the service provider;
[0185] In some embodiments, the first determining module 402 is specifically used to: obtain at least two media types selected by the service provider from a variety of media types.
[0186] In some embodiments, the device may further include:
[0187] The anomaly detection module is used to generate a data scheduling request when any replica data corresponding to the target is detected to be abnormal.
[0188] In some embodiments, the data determination module 401 is specifically used to: determine a copy of the target object's data.
[0189] In some embodiments, the second determining module 403 is specifically used for:
[0190] Randomly select a target rack from the rack set;
[0191] For any data to be scheduled, a target data node providing the target storage medium is determined from the target rack to allocate the data to be scheduled.
[0192] If the amount of data to be scheduled allocated to the target rack reaches the storage constraint, the selected target rack is removed from the plurality of racks to update the rack set.
[0193] Figure 4 The data scheduling device can perform Figure 1The implementation principle and technical effects of the data scheduling method described in the illustrated embodiments will not be repeated here. The specific methods by which each module and unit of the data scheduling device in the above embodiments performs its operations have been described in detail in the embodiments related to this method, and will not be elaborated upon here.
[0194] This application also provides a computing device, such as... Figure 5 As shown, the device may include a storage component 501 and a processing component 502;
[0195] The storage component 501 is used to store one or more computer instructions, wherein the one or more computer instructions are called and executed by the processing component to implement the data scheduling method provided in the embodiments of this application.
[0196] Of course, computing devices may also include other components, such as input / output interfaces, display components, communication components, etc.
[0197] Input / output interfaces provide interfaces between processing components and peripheral interface modules, which can be output devices, input devices, etc. Communication components are configured to facilitate wired or wireless communication between computing devices and other devices.
[0198] The processing component may include one or more processors to execute computer instructions to complete all or part of the steps in the above-described method. Alternatively, the processing component may be implemented as one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the above-described method.
[0199] Storage components are configured to store various types of data to support operations on the terminal. Storage components can be implemented from any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0200] The display component can be an electroluminescent (EL) element, a liquid crystal display or a microdisplay with a similar structure, or a retina-direct display or a similar laser scanning display.
[0201] It should be noted that the aforementioned computing devices can be physical devices or elastic computing hosts provided by cloud computing platforms. They can be implemented as a distributed cluster of multiple servers or terminal devices, or as a single server or a single terminal device.
[0202] This application also provides a computer-readable storage medium storing a computer program, which, when executed by a computer, can perform the above-described functions. Figure 1 The data scheduling method of the illustrated embodiment. The computer-readable medium may be included in the electronic device described in the above embodiments; or it may exist independently and not assembled into the electronic device.
[0203] This application also provides a computer program product, which includes a computer program carried on a computer-readable storage medium, and the computer program, when executed by a computer, can perform the above-described functions. Figure 1 The illustrated embodiment describes a data scheduling method. In such an embodiment, the computer program may be downloaded and installed from a network, and / or installed from a removable medium. When the computer program is executed by a processor, it performs various functions defined in the system of this application.
[0204] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0205] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0206] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0207] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.
Claims
1. A data scheduling method, characterized in that, The method is applied to a distributed storage system that provides storage media of multiple media types; the distributed storage system includes multiple data nodes; each of the multiple data nodes deploys at least one media type of storage media; the method includes: In response to a data scheduling request, determine at least one piece of data to be scheduled corresponding to the target object; From the multiple media types, at least two media types corresponding to the target object are determined; Based on the at least two media types, determine the target storage medium corresponding to each of the at least one data to be scheduled and determine the target data node that provides the target storage medium; The at least one data to be scheduled is stored in the target storage medium of its respective target data node.
2. The method according to claim 1, characterized in that, The step of determining the target storage medium corresponding to each of the at least two media types includes: For any data to be scheduled, in descending order of storage priority, determine whether the remaining storage resources of any media type corresponding to the target object meet the storage conditions, until the target media type whose remaining storage resources meet the storage conditions is obtained. Determine the target storage medium that belongs to the target medium type corresponding to the data to be scheduled.
3. The method according to claim 1, characterized in that, The at least two media types include a fast media type and a slow media type; the storage priority of the fast media type is higher than the storage priority of the slow media type; The step of determining the target storage medium corresponding to each of the at least two media types includes: For any piece of data to be scheduled, determine whether the remaining storage resources of the fast media class corresponding to the target object meet the storage conditions; If so, determine that the target medium type corresponding to the data to be scheduled is a fast medium type; If not, the target medium type corresponding to the data to be scheduled is determined to be a slow medium type; Determine the target storage medium that belongs to the target medium type corresponding to the data to be scheduled.
4. The method according to claim 1, characterized in that, The at least one data to be scheduled is multiple copies of the target object; The step of determining at least two media types corresponding to the target object from the multiple media types includes: From the multiple media types, determine the storage method corresponding to the target object; the storage method includes the at least two media types and the number of copies corresponding to each of the at least two media types; The step of determining the target storage medium corresponding to each of the at least two media types and the target data node providing the target storage medium includes: Based on the at least two media types and the number of replicas corresponding to each of the at least two media types, the target storage media corresponding to the plurality of replica data and the target data node providing the target storage media are determined.
5. The method according to claim 1, characterized in that, The at least one data to be scheduled consists of multiple data blocks and multiple verification blocks divided from the target object; The step of determining at least two media types corresponding to the target object from the multiple media types includes: From the multiple media types, the storage method corresponding to the target object is determined; the storage method includes the at least two media types and the storage priorities corresponding to the at least two media types respectively; The step of determining the target storage medium corresponding to each of the at least two media types includes: According to the storage method, a first number of data blocks in the plurality of data blocks correspond to a target storage medium of a first media type, a second number of data blocks in the plurality of data blocks correspond to a target storage medium of a second media type, and the plurality of check blocks correspond to a target storage medium of a third media type; wherein, the storage priority of the first media type is higher than that of the second media type and the third media type; the first number of data blocks is different from the second number of data blocks.
6. The method according to claim 1, characterized in that, The multiple data nodes are distributed across multiple racks; The step of determining the target storage medium corresponding to each of the at least two media types and determining the target data node providing the target storage medium includes: Based on the security requirements corresponding to the target object, determine the number of storage constraints in any rack. Based on the at least two media types and the number of storage constraints, determine the target storage medium corresponding to each of the at least one data to be scheduled, the target data node providing the target storage medium, and the target rack where the target data node is located.
7. The method according to claim 6, characterized in that, The multiple racks form a rack set, and determining the target storage medium corresponding to each of the at least two media types and the number of storage constraints, the target data node providing the target storage medium, and the target rack where the target data node is located, according to the at least two media types and the number of storage constraints, includes: Randomly select a target rack from the rack set; For any data to be scheduled, a target data node providing the target storage medium is determined from the target rack to allocate the data to be scheduled. If the number of data to be scheduled allocated to the target rack reaches the storage constraint number, the selected target rack is removed from the rack set to update the rack set, and the step of randomly selecting a target rack from the rack set is returned to continue execution until the allocation of at least one data to be scheduled is completed; After the allocation of at least one data to be scheduled is completed, the rack set is restored to the plurality of racks.
8. The method according to claim 1, characterized in that, Also includes: Obtain the data scheduling request sent by the service provider; The step of storing the at least one data to be scheduled into the target storage medium in its corresponding target data node includes: The service provider stores the at least one data to be scheduled into the target storage medium in its respective target data node.
9. The method according to claim 8, characterized in that, Also includes: Obtain a data retrieval request for the target object; Based on the read priorities corresponding to the at least two media types, determine the target storage medium with the high read priority corresponding to the target object; Read the data corresponding to the target object from the target storage medium.
10. The method according to claim 8, characterized in that, Also includes: Provide the service provider with the aforementioned multiple media types; The step of determining at least two media types corresponding to the target object from the multiple media types includes: The service provider obtains at least two media types selected from the plurality of media types.
11. The method according to claim 1, characterized in that, Also includes: If any copy of the target object is found to be abnormal, the data scheduling request is generated. The step of determining at least one data to be scheduled corresponding to the target object includes: Use the copy data of the target object as the data to be scheduled.
12. A distributed storage system, characterized in that, The distributed storage system provides storage media of various media types; the distributed storage system includes a control node and multiple data nodes; each of the multiple data nodes deploys at least one type of storage media. The control node is used to determine at least one piece of data to be scheduled corresponding to the target object in response to a data scheduling request. From the multiple media types, at least two media types corresponding to the target object are determined; Based on the at least two media types, determine the target storage medium corresponding to each of the at least one data to be scheduled and determine the target data node that provides the target storage medium; The data node is used to store any allocated data to be scheduled into the corresponding target storage medium.
13. A computing device, characterized in that, This includes processing components and storage components; The storage component stores one or more computer instructions; the one or more computer instructions are invoked and executed by the processing component to implement the data scheduling method as described in any one of claims 1 to 11.
14. A computer-readable storage medium, characterized in that, It stores a computer program, which, when executed by a processing component, implements the data scheduling method as described in any one of claims 1 to 11.
15. A computer program product, characterized in that, Includes a computer program / instruction that, when executed by a processing component, implements the data scheduling method as described in any one of claims 1 to 11.