Data scheduling method, distributed storage system, storage medium and computer program product
By configuring initial weight coefficients for racks and decaying these weight coefficients during data scheduling, the data imbalance problem caused by rack heterogeneity in distributed storage systems is solved, improving system availability and performance.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ALIBABA CLOUD COMPUTING CO LTD
- Filing Date
- 2024-12-27
- Publication Date
- 2026-06-30
Smart Images

Figure CN122309101A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of storage technology, and in particular to a data scheduling method, a distributed storage system, a computing device, a storage medium, and a computer program product. Background Technology
[0002] A distributed storage system is a storage system that distributes data across multiple independent data nodes. Data nodes typically use storage media as the physical carrier for storing data, and the multiple data nodes in a distributed storage system are usually distributed across multiple racks.
[0003] To ensure system performance and reliability, it is usually required that the amount of data written to all data nodes in a distributed storage system be balanced. However, in practical applications, multiple data nodes in a distributed storage system may be deployed on the same rack, and the number of data nodes deployed on each rack may be different.
[0004] In existing technologies, when selecting data nodes for data to be scheduled, racks are first randomly selected, and then data nodes are selected from those racks for allocation. However, this approach may increase the likelihood of certain data nodes being selected, leading to a larger number of allocated data nodes and creating hotspots. This results in an inability to guarantee data balance, decreased system utilization, and reduced service availability. Summary of the Invention
[0005] This application provides a data scheduling method, a distributed storage system, a computing device, a storage medium, and a computer program product to solve the technical problem of poor service availability in distributed storage systems.
[0006] In a first aspect, this application provides a data scheduling method applied to a distributed storage system; the distributed storage system includes multiple data nodes; the multiple data nodes are distributed across multiple racks, and the method includes:
[0007] In response to a data scheduling request, determine at least one piece of data to be scheduled for the target object;
[0008] Determine the weighting coefficients corresponding to the plurality of racks; the initial values of the weighting coefficients are determined based on the number of data nodes deployed in the racks;
[0009] Based on the weight coefficients corresponding to the multiple racks, at least one target rack is determined to allocate the at least one data to be scheduled;
[0010] Based on the number of data to be scheduled for the target objects allocated to the at least one target rack, the weight coefficients corresponding to the at least one target rack are attenuated.
[0011] Secondly, this application provides a distributed storage system, which includes a control node and multiple data nodes; the multiple data nodes are distributed in multiple racks.
[0012] The control node is configured to, in response to a data scheduling request, determine at least one data to be scheduled for a target object; determine weight coefficients corresponding to the plurality of racks respectively; the initial value of the weight coefficients is determined based on the number of data nodes deployed in the racks; determine at least one target rack to allocate the at least one data to be scheduled based on the weight coefficients corresponding to the plurality of racks respectively; and attenuate the weight coefficients corresponding to the at least one target rack respectively based on the number of data to be scheduled for the target object allocated to the at least one target rack respectively.
[0013] The data node is used to store any allocated data to be scheduled into the corresponding storage medium.
[0014] Thirdly, this application provides a data scheduling device applied to a distributed storage system; the distributed storage system includes multiple data nodes; the multiple data nodes are distributed in multiple racks, and the method includes:
[0015] The first determining module is used to determine at least one data to be scheduled for the target object in response to a data scheduling request;
[0016] The second determining module is used to determine the weight coefficients corresponding to the plurality of racks respectively; the initial value of the weight coefficients is determined according to the number of data nodes deployed in the racks;
[0017] The third determining module is used to determine at least one target rack to allocate the at least one data to be scheduled based on the weight coefficients corresponding to the multiple racks respectively;
[0018] The attenuation module is used to attenuate the weight coefficients corresponding to the at least one target rack according to the number of data to be scheduled for the target object allocated to the at least one target rack respectively.
[0019] Fourthly, this application provides a computing device, including a processing component and a storage component;
[0020] The storage component stores one or more computer instructions; the one or more computer instructions are invoked and executed by the processing component to implement the data scheduling method provided in the embodiments of this application.
[0021] Fifthly, this application provides a computer-readable storage medium storing a computer program thereon, which, when executed by a processing component, implements the data scheduling method provided in this application.
[0022] Sixthly, this application provides a computer program product, including a computer program / instruction, which, when executed by a processing component, implements the data scheduling method provided in this application.
[0023] In the embodiments of this application, each rack in the distributed storage system is configured with a weight coefficient. The initial value of the weight coefficient is determined based on the number of data nodes deployed in the rack. During the data scheduling process, after determining the target rack for allocating at least one data to be scheduled, the weight coefficient of the target rack is reduced according to the number of data to be scheduled for the target object allocated to the target rack. This achieves dynamic updating of the weight coefficient of the rack, thereby reducing the probability of the target rack being selected subsequently, and thus reducing the probability of data nodes in the target rack being selected. This ensures data balance among data nodes, enabling the data distribution of the distributed storage system to maintain a relatively balanced state and improving the availability of the distributed storage service.
[0024] These or other aspects of this application will become more apparent in the following description of the embodiments. Attached Figure Description
[0025] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0026] Figure 1 A flowchart of an embodiment of a data scheduling method provided in this application;
[0027] Figure 2 This application provides a schematic diagram of the structure of a distributed storage system.
[0028] Figure 3 This illustration shows a scenario interaction diagram of the technical solution of this application embodiment in a practical application;
[0029] Figure 4 A schematic diagram of the structure of an embodiment of a data scheduling device provided in this application;
[0030] Figure 5This is a schematic diagram of the structure of one embodiment of a computing device provided in this application. Detailed Implementation
[0031] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
[0032] In some of the processes described in the specification, claims, and accompanying drawings of this application, multiple operations appearing in a specific order are included. However, it should be clearly understood that these operations may not be executed in the order they appear herein, or may be executed in parallel. The operation numbers, such as 101, 102, etc., are merely used to distinguish different operations and do not themselves represent any execution order. Furthermore, these processes may include more or fewer operations, and these operations may be executed sequentially or in parallel. It should be noted that the descriptions such as "first," "second," etc., in this document are used to distinguish different messages, devices, modules, etc., and do not represent a chronological order, nor do they limit "first" and "second" to different types.
[0033] It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, data stored, data displayed, etc.) involved in this application are all information and data authorized by the user or fully authorized by all parties. Furthermore, the collection, use and processing of the relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and corresponding operation portals are provided for users to choose to authorize or refuse.
[0034] It should be noted that the technical solutions of this application embodiment are applicable to the network virtual environment. The user described generally refers to a "virtual user". Real users can register user accounts on the server through registration to obtain user identity in the network environment. In this application embodiment, the same user account can be used to log in to the server through different types of clients, so that the server can identify the same user. Of course, different user accounts can also be used to log in to the server through different types of clients. The server stores different user account binding relationships, so different user accounts with binding relationships can be considered as the same user.
[0035] The technical solutions of this application can be applied to distributed storage scenarios. A distributed storage system is a storage system that distributes data across multiple data nodes. These data nodes are interconnected and work collaboratively through a network, presenting themselves as a unified storage resource pool under the coordination of a control node. Distributed storage systems typically use storage media as the physical carrier for storing data to preserve it long-term.
[0036] As described above, a distributed storage system can consist of a control node, also known as the Master node, and multiple data nodes. The control node acts as the "brain" of the distributed storage system. It is responsible for managing and coordinating the operation of the entire storage system, including tasks such as storage resource allocation, data scheduling, replica management, and system monitoring. For example, when new data needs to be stored, the control node determines which data nodes the data should be stored on based on a pre-defined data distribution strategy (such as hash distribution or replication strategy), ensuring that the data is distributed reasonably and efficiently throughout the storage system. Data nodes are the actual units for storing data in a distributed storage system. Each data node is equipped with storage media, such as HDDs (Hard Disk Drives) or SSDs (Solid-State Drives), to store data and provide data retrieval services when needed.
[0037] In distributed storage systems, data availability and fault tolerance are ensured by dividing large datasets (such as files) into multiple smaller chunks and replicating and distributing these chunks across different data nodes. Distributed storage systems typically employ data redundancy strategies such as multi-replica or erasure coding (EC). Multi-replica involves dividing the original data into multiple chunks and replicating each chunk multiple times, with each replica distributed across different data nodes. Erasure coding, on the other hand, divides the original data into multiple chunks and generates additional checksum blocks. These chunks and checksum blocks are then distributed across different data nodes, achieving data redundancy and recovery. In practical applications, distributed storage systems can provide storage services to service providers. When a distributed storage system serves as the underlying storage system, the service provider can refer to a storage service system built upon it, such as OSS (Object Storage Service) or EBS (Elastic Block Store). Alternatively, a distributed storage system can also serve as a user-facing storage system, where the service provider is the user.
[0038] In realizing the concept of this application, the inventors discovered that distributed storage systems, due to dynamic scaling, result in a different number of data nodes in each rack, a phenomenon known as rack heterogeneity. In this situation, if a rack becomes overloaded with data, it becomes a hotspot, leading to data imbalance, decreased system utilization, and impacting service availability. Furthermore, the data nodes within that rack will face significant workload pressure. When a large number of read and write requests are concentrated on this hotspot rack, the rack may experience response delays or even fail to respond promptly due to limited processing capacity. This directly affects the availability of storage services; users may encounter slow data read or write speeds, or even be unable to perform operations, severely impacting the user experience.
[0039] Further research by the inventors revealed that the hotspot issue arises because related technologies randomly select racks based on weighted coefficients, and then choose data nodes from those racks. Specifically, the weighted coefficient for each rack is pre-calculated based on the number of data nodes deployed in the distributed storage system's racks. Then, during data scheduling, the target rack for allocating data is randomly selected based on these weighted coefficients. However, due to rack heterogeneity, the number of data nodes in each rack varies. Racks with more data nodes have a higher weighted coefficient, increasing the likelihood that data nodes in racks with fewer data nodes will be selected, thus affecting data balance.
[0040] To ensure data balance in heterogeneous rack environments, the inventors considered that since the magnitude of the weighting coefficient affects the probability of a rack being selected, it might be possible to adjust the weighting coefficient of racks that may experience hotspots. However, how to adjust it and to what value would be a very complex process. Furthermore, the calculation result of the weighting coefficient is strongly correlated with factors such as the number of data nodes deployed in the rack, data redundancy strategies, and rack security strategies. Changes in any of these factors will lead to changes in the weighting coefficient, making the implementation of a predefined rack weighting coefficient scheme complex and non-universal, and still affecting the service availability of the distributed storage system.
[0041] Therefore, after a series of considerations, the inventors proposed the technical solution of the embodiments of this application. In the embodiments of this application, in the distributed storage system, a weight coefficient is configured for each rack. The initial value of the weight coefficient is determined according to the number of data nodes deployed in the rack. During the data scheduling process, after each rack is allocated data to be scheduled, its weight coefficient is decayed. The probability of the rack being selected later is reduced by dynamically updating the weight, thereby reducing the probability of data nodes in the rack being selected. This ensures data balance among data nodes, so that the data distribution of the distributed storage system can maintain a relatively balanced state and improve the availability of the distributed storage service.
[0042] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.
[0043] The implementation details of the technical solutions in the embodiments of this application are described in detail below.
[0044] Figure 1 This is a flowchart illustrating a data scheduling method according to an embodiment of this application. This data scheduling method can be applied to a distributed storage system. The distributed storage system includes a control node and multiple data nodes. The data scheduling method can be executed by the control node. The multiple data nodes are distributed across multiple racks. These multiple racks constitute a rack set, such as... Figure 1 As shown, the data scheduling method may include the following steps:
[0045] 101: In response to a data scheduling request, determine at least one data item to be scheduled for the target object.
[0046] In the embodiments of this application, during the actual operation of the distributed storage system, data scheduling requests may be triggered by a variety of situations.
[0047] In one alternative embodiment of this application, the data scheduling request may be triggered by the service provider, so the method may further include: obtaining the data scheduling request sent by the service provider.
[0048] As mentioned above, in practical applications, the service provider of a distributed storage system can include a storage service system built on top of the distributed storage system, or a user. Therefore, data scheduling requests can be triggered when the storage service system or the user has a new data write requirement.
[0049] In another alternative embodiment of this application, the data scheduling request may also be generated when an anomaly is detected in any data corresponding to the target object, such as data corruption or degraded storage performance of the storage medium, which requires data rescheduling for data recovery or data migration.
[0050] In one alternative embodiment of this application, the target object can be a data block obtained by switching the original data, and the data to be scheduled can be implemented as a copy of the target object. At least one copy of the data to be scheduled can include multiple copies of the data, which are generated by copying the target object into multiple copies to achieve the above-mentioned multi-copy data redundancy strategy.
[0051] The aforementioned data scheduling request may be triggered by the service provider requesting to store multiple copies of the target object's data, or it may be generated when the control node detects an anomaly in any copy of the target object's data.
[0052] In another alternative embodiment of this application, the target object may refer to the original data, and at least one data to be scheduled may include data blocks divided into the target object and a verification block, so as to implement the data redundancy strategy of the above-mentioned error correction code.
[0053] 102: Determine the weighting coefficients for each of the multiple racks.
[0054] The initial value of the weighting coefficient can be determined based on the number of storage media deployed in the rack. The initial value can be the number of storage media or a multiple of the number of storage media.
[0055] For example, assuming an initial weight value that is twice the number of storage media, and assuming a distributed storage system includes 5 data nodes: A, B, C, D, and E, distributed across 3 racks: rack_1 (A, B), rack_2 (C, D), and rack_3 (E), with each data node deploying 76 storage media, then the initial weight coefficient of rack_1 is (76+76)*2 = 304; the initial weight coefficient of rack_2 is (76+76)*2 = 304; and the initial weight coefficient of rack_3 can be 76*2 = 152. The initial weight coefficients corresponding to multiple racks constitute a static weight combination cops = [304, 304, 152]. Furthermore, the weight coefficients corresponding to multiple racks can constitute a dynamic weight combination ops, whose initial value is also [304, 304, 152].
[0056] The number of data nodes deployed in a rack is related to the rack's storage capacity. Racks with more data nodes generally have stronger storage capacity and can accommodate more data to be scheduled.
[0057] 103: Based on the weight coefficients corresponding to multiple racks, determine at least one target rack to allocate at least one piece of data to be scheduled.
[0058] After selecting at least one target rack, a target data node can be randomly selected from each of the at least one target rack to store the data to be scheduled. Of course, the selection of target data nodes can also be achieved in other ways, which will be described in the following embodiments.
[0059] In the embodiments of this application, the target rack can be selected randomly according to its weight ratio. For example, racks with higher weight coefficients are more likely to be selected. Through this probabilistic selection mechanism, the data to be scheduled is more likely to be allocated to racks with stronger storage capacity, but at the same time, a certain degree of randomness is retained to avoid allocating the data to be scheduled to only a few racks with high weights, so as to ensure that the data has a relatively reasonable distribution possibility among multiple racks.
[0060] 104: Based on the number of data to be scheduled for the target objects allocated to at least one target rack, the weight coefficients corresponding to each of the at least one target rack are attenuated.
[0061] In the process of developing this application, the inventors discovered that after determining the target rack, if a data node is randomly selected from the target rack as the target data node, due to the heterogeneity of the racks, the number of data nodes in each rack may be different. Therefore, if the initial value of the weight coefficient is selected from the target rack, it may increase the probability of selecting a certain data node, thus failing to guarantee the data balance among the data nodes.
[0062] For example, in the above example, suppose a distributed storage system includes 5 data nodes A, B, C, D, and E, distributed across 3 racks: rack_1 (A, B), rack_2 (C, D), and rack_3 (E). Without rack constraints, the expected selection rate for each data node is 2 / 5. However, considering rack constraints, since only one data node is deployed in rack_3, the expected selection rate for data node E is greater than 2 / 5. Meanwhile, the expected selection rates for data nodes A, B, C, and D are less than 2 / 5. This results in the smaller rack's corresponding machine (machine E in rack_3) being allocated a larger number of chunks, leading to hotspots, impacting service availability, or causing low overall cluster utilization and increased storage costs.
[0063] Based on this, in the embodiments of this application, after determining at least one target rack, the weight coefficients of each target rack can be attenuated according to the actual amount of data to be scheduled for the target objects allocated to each target rack, thereby achieving dynamic updates to the rack weight coefficients. The amount of data to be scheduled for the target objects allocated to each target rack can refer to the total amount of data to be scheduled belonging to that target object that has been cumulatively allocated. By attenuating the weight coefficients of the target racks, the probability of the target rack being selected subsequently can be reduced, thereby reducing the probability of data nodes within the target rack being selected, thus ensuring data balance among data nodes. This avoids a situation where a rack, due to a high initial weight coefficient or being selected multiple times in the early stages, continuously bears excessive data storage tasks, leading to increasingly unbalanced data distribution. For example, if a rack is allocated a large amount of data in a data scheduling session, after its weight coefficient is attenuated, its probability of being selected in the next data scheduling session will decrease accordingly, thus guiding more data to be scheduled to flow to other underutilized racks, enabling the distributed storage system to maintain a relatively balanced data distribution, improving the overall performance and resource utilization of the distributed storage system.
[0064] In some embodiments, determining at least one target rack based on the weight coefficients corresponding to multiple racks can be specifically implemented as follows: determining at least one target rack based on the weight coefficients corresponding to multiple racks and the number of storage constraints.
[0065] The storage constraint limit restricts the maximum amount of allocatable, scheduleable data belonging to the target object that can be stored in each rack. Therefore, in the embodiments of this application, when determining the target rack, multiple racks can be filtered in conjunction with the storage constraint limit so that the determined target rack is one that has not reached the storage constraint limit.
[0066] The number of storage constraints can be preset. Of course, in practical applications, it can be determined in combination with security requirements. In some embodiments, the method may also include: determining the number of storage constraints corresponding to any target rack according to the security requirements corresponding to the target object.
[0067] In distributed storage systems, different target objects have different rack security requirements due to their importance, application scenarios, and other factors. Security requirements are often related to fault tolerance. For example, a security requirement might be to ensure that even if a rack fails (such as a power outage, network failure, or hardware damage), the data remains intact and available without affecting the normal operation of the system.
[0068] Based on the security requirements of the target object, the control node can determine the number of storage constraints in each rack accordingly. These security requirements may include, for example, Rack_Domain or Rack-Machine_Domain security requirements. These security requirements can be configured by the service provider, or they can be determined by the control node based on the data characteristics of the target object.
[0069] Assuming the target object corresponds to 3 copies of data, then:
[0070] A Rack Domain can refer to a single rack where data is not lost. Under the premise of meeting Rack Domain security requirements, each rack must hold a maximum of two data items awaiting scheduling. Therefore, the storage constraint is also two. If a rack fails, at most two data items awaiting scheduling will become unreadable, leaving one readable data item remaining for the target object, thus ensuring the safety requirements of the Rack Domain.
[0071] Rack-Machine_Domain can refer to a situation where a data node is attached to one rack and another rack, and no data loss occurs. It offers higher security compared to Rack_Domain. Taking the example of a target object corresponding to 3 replicas, Rack-Machine_Domain requires that each rack can hold a maximum of 1 replica; in this case, the number of storage constraints is also 1.
[0072] It should be noted that the above are merely examples illustrating possible ways to implement security requirements. In practical applications, in addition to considering racks and data nodes, the area where the racks are located can also be considered to determine the number of storage constraints in each rack, thereby determining the maximum number of scheduled data of the target object that can be stored in each rack. This application does not impose any limitations on this.
[0073] In some embodiments, determining at least one target rack based on the weight coefficients corresponding to multiple racks can be implemented as follows: Selecting a target rack from the rack set based on the weight coefficients corresponding to each rack; selecting a data node from the target rack to allocate one piece of data to be scheduled; if the number of data to be scheduled for the target object allocated to the target rack reaches the storage constraint limit, deleting the selected target rack from the rack set to update the rack set, and returning to the step of randomly selecting a target rack from the rack set based on the weight coefficients corresponding to each rack, continuing execution until at least one piece of data to be scheduled is allocated. After at least one piece of data to be scheduled is allocated, the rack set can be restored to the original set of multiple racks to continue data scheduling for the remaining target objects, at which point the weight coefficients of the multiple racks have been decayed. It should be noted that deleting the selected target rack from the rack set here is an algorithmic operation, not an actual removal of the target rack from the rack set; it only indicates that the selected target rack will no longer participate in the allocation of subsequent unallocated data to be scheduled.
[0074] Selecting a target rack from a set of racks can be achieved, for example, in the following manner:
[0075] First, calculate the sum of weight coefficients S for all racks; then, generate a random number N in the range [1, S]; then, scan the rack set, and if the random number N is greater than the weight coefficient of a rack, then that rack can be used as the target rack.
[0076] If the number of target objects to be scheduled for the target rack reaches the storage constraint limit, the selected target rack can be removed from the rack set to update the rack set. Then, the operation of selecting a target rack from the rack set can continue to be executed to continue allocating the next data to be scheduled.
[0077] In one alternative embodiment of this application, the attenuation of the weight coefficient of the target rack based on the number of data to be scheduled for the target object allocated to the target rack can be specifically implemented as follows: determining the attenuation value of the weight coefficient corresponding to the number of data to be scheduled for the target object allocated to the target rack; and attenuating the weight coefficient corresponding to the target rack using the weight attenuation value.
[0078] In the embodiments of this application, a target value corresponding to the allocation of a single piece of data can be preset. This target value can be 1, and the weight coefficient decay value can be the product of the number of allocated data to be scheduled and the target value. When the target value is 1, that is, when the weight coefficient decays by 1 for each piece of data to be scheduled allocated to the rack, the weight coefficient decay value can be the number of data to be scheduled. For example, if the current weight coefficient of the target rack is 10 and the number of allocated data to be scheduled is 2, then the weight coefficient decay value is 2. When it is necessary to decay the weight coefficient of the target rack, the decayed weight coefficient can be: 10 - 2 = 8.
[0079] In another alternative embodiment of this application, the attenuation of the weight coefficient of the target rack based on the number of data to be scheduled for the target object allocated to the target rack can be specifically implemented as follows: determining the weight coefficient attenuation ratio corresponding to the number of data to be scheduled for the target object allocated to the target rack; and attenuating the weight coefficient corresponding to the target rack using the weight coefficient attenuation ratio.
[0080] In the embodiments of this application, a target proportion corresponding to the allocation of a single data item can be preset, such as 10%. The weight coefficient decay ratio can be the product of the number of allocated data items to be scheduled and the target proportion. For example, if the current weight coefficient of the target rack is 10, the target proportion is assumed to be 20%, and the number of allocated data items to be scheduled is 2, then the weight coefficient decay ratio is 20%, and the decayed weight coefficient can be: 10 × (1 - 20%) = 8.
[0081] Of course, the above-mentioned attenuation of the weight coefficients corresponding to the at least one target rack based on the number of data to be scheduled for the target objects allocated to the at least one target rack can also be achieved by attenuating the weight coefficients corresponding to the target rack by a target value or target proportion whenever a data to be scheduled is allocated to each target rack.
[0082] In some embodiments, the method may further include:
[0083] Based on the total amount of data allocated to the multiple racks, calculate the allocation values corresponding to the multiple racks; calculate the weight sum value corresponding to the initial value of the weight coefficients corresponding to the multiple racks; when the ratio of the allocation value to the weight sum value reaches a predetermined value, update the weight coefficients corresponding to the multiple racks to the initial value.
[0084] The total data volume can refer to the total number of data to be scheduled corresponding to all objects allocated to multiple racks, while the allocation value can be a multiple of the total data volume. Optionally, the multiple can be 1, and the allocation value can be the total data volume allocated to multiple racks. For example, after a piece of data to be scheduled is allocated to a rack, the allocation value can be incremented by 1.
[0085] Following the example above, suppose the distributed storage system includes 5 data nodes: A, B, C, D, and E, distributed across 3 racks: rack_1 (A, B), rack_2 (C, D), and rack_3 (E). Each data node has 76 storage media. The initial weight coefficient for rack_1 is (76 + 76) * 2 = 304; the initial weight coefficient for rack_2 is (76 + 76) * 2 = 304; and the initial weight coefficient for rack_3 is 76 * 2 = 152. Therefore, the total weight is 304 + 304 + 152 = 760. For each piece of data allocated to a rack, the weight is incremented by 2. Assuming a predetermined weight of 0.95, if the total weight is 722, the weight is 0.95, which is greater than or equal to the predetermined weight. In this case, the weight coefficients can be updated to their initial values.
[0086] In the embodiments of this application, by updating the weight coefficients of multiple racks to their initial values, a periodic and dynamically balanced weight adjustment mechanism can be implemented to ensure that data scheduling proceeds normally and to guarantee the stable and efficient operation of the entire storage system.
[0087] In some embodiments, after determining at least one target rack for allocating the at least one data to be scheduled based on the weight coefficients corresponding to multiple racks, the method may further include:
[0088] Determine whether the number of scheduled data for a target object allocated to any target rack exceeds the storage constraint limit; if yes, re-determine at least one target rack to allocate the at least one scheduled data based on the initial values of the weight coefficients corresponding to the multiple racks; if no, perform the operation of attenuating the weight coefficients corresponding to the at least one target rack based on the number of scheduled data for the target objects allocated to the at least one target rack.
[0089] As the weighting coefficients decay, some racks' weighting coefficients may decay to a preset threshold, such as 0, while others' weighting coefficients may not have decayed to the preset threshold (i.e., non-zero). This means that data to be scheduled can only be selected from racks with non-zero weighting coefficients, causing some racks to exceed the storage constraint limit for allocated data, leading to security issues. Therefore, in this embodiment, the number of data to be scheduled for each target object allocated to each target rack can be determined. If it exceeds the storage constraint limit, rescheduling can be performed based on the initial values of the weighting coefficients corresponding to the multiple racks. If the number of data to be scheduled for each target object allocated to at least one target rack does not exceed the storage constraint limit, the process can continue, decaying the weighting coefficients corresponding to at least one target rack and scheduling the data to be scheduled to the corresponding target rack.
[0090] In some embodiments, determining whether the number of scheduled data for a target object allocated to any target rack exceeds the storage constraint may include:
[0091] If the weight coefficient of any rack decays to a preset threshold, determine whether the number of data to be scheduled for the target object allocated to any target rack exceeds the storage constraint.
[0092] The preset threshold can be 0.
[0093] In other words, this judgment step can be added when the weight coefficient of any rack decays to a preset threshold. If the weight coefficients of multiple racks have not decayed to the preset threshold, the process can continue to decay the weight coefficient of at least one target rack and schedule the data to be scheduled to the corresponding target rack.
[0094] Furthermore, as described above, the allocation values corresponding to multiple racks can be statistically calculated based on the total amount of data already allocated to multiple racks. In this embodiment, after redetermining at least one target rack to allocate the at least one data to be scheduled based on the initial values of the weight coefficients corresponding to each of the multiple racks, the total amount of data is also accumulated, thereby accumulating the allocation value. That is, whether the data to be scheduled is allocated according to dynamic weight coefficients or static weight coefficients, the statistical operation of the allocation value will be triggered.
[0095] For ease of understanding, the above example will still be used to introduce the technical solution of this application.
[0096] Suppose a distributed storage system consists of 5 data nodes: A, B, C, D, and E, distributed across 3 racks: rack_1(A,B), rack_2(C,D), and rack_3(E). Each data node is equipped with 76 storage media. Assume that the initial value of the weighting coefficient is twice the number of storage media.
[0097] The weight coefficients of data nodes A, B, C, D, and E are [152, 152, 152, 152, 152];
[0098] The initial values of the weight coefficients for rack_1 are (76+76)*2 = 304; the initial values of the weight coefficients for rack_2 are (76+76)*2 = 304; and the initial values of the weight coefficients for rack_3 are 76*2 = 152. The initial values of the weight coefficients corresponding to multiple racks constitute a static weight combination cops = [304, 304, 152]. Furthermore, the weight coefficients corresponding to multiple racks can constitute a dynamic weight combination ops, whose initial value is also [304, 304, 152]. SumR represents the assigned value, and SumOps represents the weight sum.
[0099] In the initialization case:
[0100] Dynamic weights ops = [304, 304, 152]; static weights cops = [304, 304, 152]; allocation value SumR = 0; weight sum SumOps = 152 * 5 = 304 + 304 + 152 = 760. The predetermined value is 0.95. The storage constraint is 1, meaning each rack can hold at most one copy; the target object is assumed to have two copies.
[0101] When data scheduling is based on ops:
[0102] S0: Based on ops[], randomly select one rack from the three racks, let's say rack_1.
[0103] S1: Remove the weight coefficient of rack_1 from ops[]; then, based on the updated ops[], randomly select a rack from the remaining racks after removing rack_1, let's say rack_2. If rack_2 does not meet the storage constraint quantity, return allocation failure and perform data scheduling according to cops.
[0104] S2: Randomly select a data node from rack_1 and rack_2, and the allocation is successful.
[0105] S3: Decrease the weight coefficients of rack_1 and rack_2 in ops[] by 1.
[0106] S4: SumR+=2. If SumR / SumOps>=0.95, then update ops=[304,304,152].
[0107] When data scheduling is based on COPS:
[0108] S0: Based on cops[], randomly select one rack from the three racks, let's say it's rack_1.
[0109] S1: Based on cops[], remove the weight coefficient of rack_1 from cops[]; based on the updated cops[], randomly select a rack from the remaining racks after removing rack1, let's say rack_2.
[0110] S2: Randomly select a data node from rack_1 and rack_2, and the allocation is successful.
[0111] S3: SumR+=2. If SumR / SumOps>=0.95, then update ops=[304,304,152].
[0112] In some embodiments, the method may further include: generating a data scheduling request when any copy of the target object is detected to be abnormal. Based on this, determining at least one data item to be scheduled corresponding to the target object includes: determining the copy data of the target object.
[0113] In embodiments of this application, when any copy data anomaly is detected, the control node needs to take timely measures to repair and adjust it in order to ensure data integrity, redundancy, and availability. In this case, a data scheduling request can be generated. The data scheduling request is used to request the replacement or repair of the abnormal copy data by scheduling other normal copy data or regenerating a copy, so that the number of copies and data content of the target object on various storage media and data nodes can be restored to a normal state, ensuring that the data can still be reliably accessed and used. For example, in a distributed storage system storing important enterprise document data, if an anomaly is detected in the copy data on a certain data node, after generating a data scheduling request, the system can schedule normal copy data on other data nodes to cover the abnormal copy, or regenerate a copy based on the original data and store it in a suitable location to maintain the redundant backup state of the data.
[0114] In some embodiments, the distributed storage system provides storage media of multiple media types; at least one media type of storage media is deployed in multiple data nodes respectively.
[0115] In embodiments of this application, the different types of storage media provided by the distributed storage system can be categorized according to different dimensions. For example, in one possible implementation, the media types can be categorized according to storage performance into high-speed, medium-speed, and low-speed media types; or according to storage capacity into large-capacity, medium-capacity, and small-capacity media types; or according to cost into high-cost, medium-cost, and low-cost media types; or according to working principle into SSD type, HDD type, etc. The same storage medium can be classified into different media types. For example, SSDs have faster read and write speeds and belong to the high-speed media type, and their common capacities range from 128GB to several TB, belonging to the medium-capacity media type. Solid-state drives have relatively high unit storage costs and belong to the high-cost media type.
[0116] In practical applications, different media types can be classified according to different application requirements by selecting the appropriate classification dimensions.
[0117] For ease of understanding, the following one or more embodiments mainly use the classification of media types according to the storage performance dimension to explain the specific implementation of the embodiments of this application.
[0118] In some embodiments, the method may further include: determining at least two media types corresponding to the target object from a plurality of media types.
[0119] In the embodiments of this application, at least two media types corresponding to the target object can be determined from a variety of media types based on the storage method preset by the service provider. For example, determining at least two media types corresponding to the target object can specifically be implemented by: determining the service provider corresponding to the target object, and then determining the storage method pre-configured by the service provider, which may include at least two media types. Thus, the at least two media types configured in the storage method can be determined as the at least two media types corresponding to the target object.
[0120] In another possible implementation of this application, at least two media types corresponding to the target object can be determined based on the data characteristics of the target object. These data characteristics may include, for example, access frequency, data importance, and data size.
[0121] For example, regarding access frequency, among at least one set of data to be scheduled for a target object, frequently accessed data, such as real-time order information from e-commerce platforms or user activity information from popular social media platforms, is considered "hot data" and requires fast read / write responses. For hot data, high-speed media types (such as SSDs) can be selected. For infrequently accessed data, such as backup files, historical records, or rarely accessed archives—"cold data"—relatively slower but high-capacity low-speed media types such as HDDs or magnetic tapes can be used.
[0122] In another possible implementation of this application, at least two media types corresponding to the target object can be determined based on the storage resources of the distributed storage system. For example, if the storage space of SSDs in the storage system is limited, while HDD resources are sufficient, for a new target object, HDD resources may be utilized as rationally as possible based on the data characteristics, while a small number of SSDs may be used for data storage of critical parts. For example, in a small data center with limited storage resources, for newly uploaded user files, a small portion of critical data such as the file index may be stored on limited SSDs, while the file body may be stored on HDDs.
[0123] In a practical application, in order to balance storage performance and cost, the at least two media types may include a fast media type and a slow media type.
[0124] In some embodiments, selecting a data node from the target rack to allocate a data to be scheduled can be specifically implemented as follows: determining the target storage medium corresponding to at least one data to be scheduled according to at least two media types, and for any data to be scheduled, selecting a target data node from the unallocated data nodes in the target rack that provides the target storage medium corresponding to the data to be scheduled.
[0125] In embodiments of this application, the control node can record and manage the media types of all storage media in the distributed storage system and the data nodes deployed with those media types. The control node can maintain a detailed record, including the media type of each storage medium in the distributed storage system, such as SSDs being high-speed media and HDDs being low-speed media, as well as their deployment on each data node. For example, the control node records that data node A is equipped with both SSDs and HDDs, while data node B only has HDDs.
[0126] When a new target object needs to be stored, and at least two media types have been identified, the control node can make a series of scheduling decisions based on its recorded information. For example, as shown above, the control node can determine at least two media types that match the target object based on its data characteristics, such as access frequency, data importance, security requirements, data size, or pre-configured storage method.
[0127] Next, the control node can determine the media type corresponding to each piece of data to be scheduled from at least two media types, and select a target data node from the unallocated data nodes in the target rack that provides the target storage medium corresponding to the data to be scheduled. The data to be scheduled can then be allocated to the target storage medium of that target data node for storage. The target storage medium can be any type of storage medium, and the target data node can be any data node providing the target storage medium. For example, if some of the data to be scheduled for a target object corresponds to an SSD, the control node can search all data nodes equipped with SSD-type storage media to determine target data node A, and then allocate the data to be scheduled to the SSD-type target storage medium on target data node A for storage.
[0128] Therefore, the control node can determine at least one target data node for at least one scheduled data of a target object, and this target data node provides a target storage medium of the corresponding media type. In this way, the target object can be effectively allocated to the storage medium most suitable for its characteristics, meeting different data storage needs, while ensuring high data availability and efficient operation of the distributed storage system.
[0129] In the process of developing this application, the inventors discovered that traditional methods, which use only one type of storage medium for data scheduling (e.g., using SSDs to ensure storage performance), require reserving a significant amount of SSD resources in the distributed storage system to prevent SSDs from being overwritten in abnormal user scenarios, leading to substantial cost pressures. Furthermore, traditional methods trigger alarms when SSD resource utilization exceeds 80% to ensure service availability. However, due to the high write speed of SSDs, even with 20% of resources reserved, if maintenance cannot quickly identify and fix the problem, SSD resources may become full, posing a risk of service interruption. Therefore, to further improve service availability, some embodiments determine the target storage medium for at least one data item to be scheduled, based on at least two media types. This can be implemented as follows: for any data item to be scheduled, in descending order of storage priority, sequentially determine whether the remaining storage resources of any media type corresponding to the target object meet the storage conditions until the target media type with sufficient remaining storage resources is obtained; determine at least one target storage medium belonging to the target media type corresponding to the data item to be scheduled.
[0130] Storage priority can be preset based on one or more factors such as the performance characteristics and cost of different types of storage media.
[0131] For example, based on read / write speed, SSDs can be given higher priority due to their fast read / write capabilities; HDDs, with their relatively slower read / write speeds, may have lower priority than SSDs; and tape, with the slowest read / write speed, generally has even lower priority. However, in scenarios where cost is more sensitive or where large-capacity storage is emphasized, priority settings may differ. For instance, for long-term archived data, tape, due to its large capacity and low cost, may have a higher priority in such scenarios.
[0132] In one optional embodiment of this application, the storage priority can be configured by the server. When the storage method configured by the server includes at least one media type, it can also include the storage priority corresponding to each of the at least one media type. Therefore, for any data to be scheduled, the control node can sequentially determine whether the remaining storage resources of any media type corresponding to the target object meet the storage conditions based on the storage priority pre-configured by the server.
[0133] In another alternative embodiment of this application, the storage priority can also be preset by the control node based on one or more factors such as the performance characteristics and cost of different media types of storage media.
[0134] Furthermore, storage priorities can be adjusted based on the specific data requirements of the usage scenario. For example, for real-time transaction data from e-commerce platforms that requires frequent read / write operations and has extremely high response speed requirements, SSDs can be set as a higher priority media type from both performance and importance perspectives. On the other hand, for some historical order data, although the read / write frequency is not high, it is occasionally needed for querying. In this case, HDDs may become a relatively suitable priority choice, ranking after SSDs.
[0135] The remaining storage resources for any media type corresponding to the target object can refer to the remaining storage resources requested by the service provider for that media type. For example, the service provider can request 30TB (Terabyte) of SSD storage resources or 20TB of HDD storage resources.
[0136] Storage conditions could be, for example, that the remaining storage resources are greater than the size of the data to be scheduled, or that the remaining storage resources are greater than the size of the data to be scheduled and the difference between the remaining storage resources and the size of the data to be scheduled is greater than a certain threshold, so as to ensure that the remaining storage resources are sufficient to write the data to be scheduled.
[0137] This embodiment's technical solution allows data to be scheduled to a high-priority storage medium when resources of that medium type are sufficient. If the remaining storage resources of the high-priority medium type are insufficient, adaptive storage degradation can be implemented, further improving storage service availability and preventing storage failures. For example, at least two medium types are included, a first medium type and a second medium type. The first medium type has a higher storage priority than the second medium type. Therefore, the data to be scheduled for the target object can be preferentially scheduled to the first medium type storage medium. If the remaining storage resources of the first medium type storage medium for the target object are insufficient, it can be degraded to the second medium type storage medium.
[0138] In some embodiments, as described above, the at least two media types corresponding to the target object may include at least a fast media type and a slow media type.
[0139] Among these, the storage priority of fast media types can be higher than that of slow media types. For any data to be scheduled, the remaining storage resources of each media type corresponding to the target object are sequentially determined according to the storage priority from high to low to see if they meet the storage conditions, until the target media type whose remaining storage resources meet the storage conditions is found. This process can include:
[0140] For any data to be scheduled, determine whether the remaining storage resources of the fast media type corresponding to the target object meet the storage conditions; if yes, determine that the target media type corresponding to the data to be scheduled is a fast media type; if no, determine that the target media type corresponding to the data to be scheduled is a slow media type.
[0141] By initially providing storage services to the service provider using fast media, and then downgrading to slow media when the fast media type storage resources are insufficient, the slow media type storage provider takes over the service provision. This balances storage performance and cost, improving service availability without sacrificing performance or increasing costs. In some embodiments, at least one data to be scheduled is multiple copies of the target object.
[0142] In this context, replica data refers to data obtained by copying the target object. In distributed storage systems, multiple replicas of the target object can be created to improve data reliability, availability, redundancy, and to cope with potential failures and disaster recovery.
[0143] The above-mentioned determination of at least two media types corresponding to the target object from multiple media types can be specifically implemented as follows: determining the storage method corresponding to the target object from multiple media types; the storage method may include at least two media types and the number of copies corresponding to each of the at least two media types.
[0144] In some embodiments, determining the target storage medium corresponding to at least one data to be scheduled according to at least two media types, and selecting the target data node that provides the target storage medium corresponding to the data to be scheduled from the unallocated data nodes in the target rack for any data to be scheduled, can be specifically implemented as follows: determining the target storage medium corresponding to multiple replica data according to at least two media types and the number of replicas corresponding to at least two media types, and determining the target data node that provides the target storage medium from the unallocated data nodes in the target rack.
[0145] Considering the characteristics of different storage media types, multiple copies of a target object's data can be rationally allocated based on media type. For example, the at least two media types can include high-speed media types and low-speed media types. During the development of this application, the inventors discovered that in traditional methods, the data to be scheduled is either entirely written to either high-speed or low-speed media types, resulting in low service availability of the distributed storage system. To improve service availability, this embodiment utilizes at least two media types to store the target object's data to be scheduled. This is applicable to application scenarios with different read / write performance requirements. Some copies can be stored on SSDs, leveraging their fast read / write performance to meet frequent data access needs, while other copies can be stored on relatively slower but high-capacity media such as HDDs or magnetic tapes for data backup and long-term archiving.
[0146] For example, in a practical application, a service provider can pre-configure two media types—SSD and HDD—for target objects that are insensitive to write latency but sensitive to read latency, and allocate the number of replicas for each media type. For instance, for a target object with three copies of data, the number of SSD replicas can be configured as one, and the number of HDD replicas as two. When receiving a data scheduling request from the service provider for three copies of the target object's data, one copy can be placed on the SSD, and the other two on the HDD. When reading data from the target object, the corresponding copy can be read from the SSD to ensure read speed. Compared to storing all three copies on SSDs, this 1SSD, 2HDD configuration has lower storage costs (only one copy is stored on the SSD) and write latency close to HDD performance, while read performance is comparable to SSD performance. This approach achieves higher read performance while reducing storage costs.
[0147] The storage method indicates the media type and the corresponding number of copies, which can be flexibly selected according to actual application needs.
[0148] In the embodiments of this application, the allocation of the number of replicas across different media can also be determined in conjunction with the resource status of the storage system. For example, if SSD resources are limited in the distributed storage system, or the storage resources of the SSDs requested by the service provider are small, the number of replicas allocated to the SSDs can be appropriately reduced while increasing the number of replicas on other media such as HDDs, provided that certain performance is guaranteed.
[0149] In some embodiments, in the error correction code scenario, at least one data to be scheduled is a target object divided into multiple data blocks and multiple verification blocks.
[0150] The above-mentioned determination of at least two media types corresponding to the target object from multiple media types can be specifically implemented as follows: determining the storage method corresponding to the target object from multiple media types; the storage method includes at least two media types and the storage priorities corresponding to at least two media types respectively;
[0151] The determination of the target storage medium corresponding to each of the at least one data to be scheduled, according to the aforementioned at least two media types, may include: determining, according to the at least two media types, a target storage medium of a first media type corresponding to a first number of data blocks, a target storage medium of a second media type corresponding to a second number of data blocks, and a target storage medium of a third media type corresponding to multiple parity blocks; the storage priority of the first media type is higher than that of the second media type; the storage priority of the second media type is higher than that of the third media type; wherein, the storage priority may be determined in combination with storage performance and / or storage cost. The first number of data blocks is different from the second number of data blocks; the first number of data blocks and the second number of data blocks constitute the multiple data blocks.
[0152] As described above, the service provider can configure the storage priority corresponding to each media type. In the embodiments of this application, the control node can first determine the first media type according to the storage priority from high to low. A first number of data blocks can be stored in the storage medium of the first media type. Then, the second media type is determined, and a second number of data blocks can be stored in the storage medium of the second media type.
[0153] For check blocks, since they are primarily used for data integrity verification, the read / write speed requirements are typically not as high as for data blocks. Therefore, they can be stored on a third media type with a relatively lower priority. Optionally, the third media type can be the same as the second media type, or it can be a different media type than the second media type, such as magnetic tape. The control node can assign multiple check blocks to target storage media of the third media type.
[0154] In this way, the control node can rationally allocate multiple data blocks and multiple check blocks of the target object to the target storage media of different storage media types according to the storage priority determined by storage performance and / or storage cost, so as to achieve efficient, economical and reliable storage of data in the distributed storage system, and fully take into account the needs of performance, cost and data integrity guarantee in the storage process.
[0155] In some embodiments, the method may further include storing at least one data to be scheduled in a target storage medium in its respective target data node.
[0156] In some embodiments, the method may further include: obtaining a data scheduling request sent by the service provider.
[0157] Specifically, storing at least one data to be scheduled in the target storage medium of its corresponding target data node can be implemented by: feeding back to the service provider the target storage medium and target data node corresponding to at least one data to be scheduled, so that the service provider can store at least one data to be scheduled in the target storage medium of its corresponding target data node.
[0158] In the embodiments of this application, after receiving a data scheduling request from the service provider, the control node can determine the target storage medium corresponding to each piece of data to be scheduled and the target data node providing the target storage medium. Then, the control node can feed back the storage location information of each piece of data to be scheduled, i.e., the target storage medium and the target data node providing the target storage medium, to the service provider. For example, after the control node determines that the target storage medium for a certain piece of data to be scheduled is an SSD on data node A, it can send location information such as "data to be scheduled X, target storage medium is an SSD on data node A" to the service provider.
[0159] In one possible implementation of this application, the location information can be fed back in various forms. For example, it can be returned as structured data through an application programming interface (API), such as an array in JSON format containing information about each data to be scheduled, its corresponding target storage medium, and the target data node; or it can be transmitted as a message queue, from which the service provider can obtain the corresponding feedback content.
[0160] After receiving location information from the control node, the service provider can initiate its own data transmission and storage processes based on that information. The service provider can utilize its network connectivity to send the data to be scheduled to the corresponding target data node. For example, if the service provider is a cloud storage client application, it can establish a network connection with the target data node (which may be located within the cloud storage data center) according to the feedback information (e.g., using HTTP, HTTPS, or other protocols over the internet), and then upload the local data to be scheduled to the specified target storage medium on the target data node.
[0161] In some embodiments, the method further includes: obtaining a data acquisition request for the target object; determining a target storage medium with a high read priority corresponding to the target object according to the read priorities corresponding to at least two media types; and reading the data corresponding to the target object from the target storage medium.
[0162] In the embodiments of this application, the control node can determine the target storage medium with high read priority based on pre-defined read priority rules for different media types. The read priority can be the same as, but not limited to, the storage priority mentioned above. The read priority can also be set according to read performance requirements; for example, for target objects sensitive to read latency, a higher priority can be set for high-speed media types. By using read priority, read performance can be guaranteed. Optionally, in the event of a failure of the target storage medium with high read priority, the data corresponding to the target object can be read from the target storage medium with the next higher read priority to further ensure service availability.
[0163] In some embodiments, the method may further include providing the service provider with multiple media types.
[0164] The above-mentioned determination of at least two media types corresponding to the target object from multiple media types can be specifically implemented as: obtaining at least two media types selected by the service provider from multiple media types.
[0165] In the embodiments of this application, the control node provides the service provider with information on various media types, enabling the service provider to fully understand the various storage media options available for storing data. This allows the service provider to make more appropriate storage decisions based on its own storage needs, data characteristics, and cost considerations.
[0166] In addition, the service provider can configure the storage priority and read priority of different media types, as well as the amount of data to be scheduled stored in each media type.
[0167] By configuring storage priorities, read priorities, and the amount of data to be scheduled stored on each media type, the service provider can generate a storage method for the data to be scheduled for a target object. The storage method can indicate which media type should be stored for different types of data, at what priority should they be stored and retrieved, and the amount of data stored on each media type. Through the embodiments of this application, the service provider can independently select the media type according to its own needs, improving service flexibility while ensuring service availability.
[0168] In some embodiments, the method may further include: generating a data scheduling request if any copy of the target object is detected to be abnormal.
[0169] Specifically, determining at least one data to be scheduled corresponding to the target object can be achieved by using a copy of the target object as the data to be scheduled.
[0170] In embodiments of this application, when any copy data anomaly is detected, the control node needs to take timely measures to repair and adjust it in order to ensure data integrity, redundancy, and availability. In this case, a data scheduling request can be generated. The data scheduling request is used to request the replacement or repair of the abnormal copy data by scheduling other normal copy data or regenerating a copy, so that the number of copies and data content of the target object on various storage media and data nodes can be restored to a normal state, ensuring that the data can still be reliably accessed and used. For example, in a distributed storage system storing important enterprise document data, if an anomaly is detected in the copy data on a certain data node, after generating a data scheduling request, the system can schedule normal copy data on other data nodes to cover the abnormal copy, or regenerate a copy based on the original data and store it in a suitable location to maintain the redundant backup state of the data.
[0171] Figure 2 The diagram illustrates a distributed storage system according to an embodiment of this application. The distributed storage system includes a control node 201 and multiple data nodes 202; the multiple data nodes are distributed in multiple racks 203.
[0172] Control node 201 is used to respond to a data scheduling request by: determining at least one data to be scheduled for a target object; determining weight coefficients corresponding to multiple racks 203 respectively; the initial value of the weight coefficients is determined based on the number of data nodes deployed in the racks 203; determining at least one target rack to allocate at least one data to be scheduled based on the weight coefficients corresponding to the multiple racks 203 respectively; and attenuating the weight coefficients corresponding to the at least one target rack respectively based on the number of data to be scheduled for the target object allocated to the at least one target rack.
[0173] Data node 202 is used to store any allocated data to be scheduled into the corresponding storage medium.
[0174] In practical applications, distributed storage systems can provide storage services to service providers. When a distributed storage system is used as the underlying storage system, the service provider can refer to a storage service system built on top of the distributed storage system, such as OSS (Object Storage Service) or EBS (Elastic Block Store). Of course, a distributed storage system can also be used as a user-facing storage system, in which case the service provider is the user.
[0175] The specific implementation methods for the control node and data node can be found in [reference]. Figure 1 The data scheduling method shown will not be described in detail here.
[0176] To facilitate understanding, the following will be combined with... Figure 3 The following is a scene interaction diagram to introduce the technical solution of the embodiments of this application.
[0177] like Figure 3 As shown, service provider 301 can send data scheduling requests to distributed storage system 302, and distributed storage system 302 can receive data scheduling requests using control node 201.
[0178] The distributed storage system 302 can have three racks: rack 3021, rack 3022, and rack 3023. Among them, rack 3021 is equipped with data node 2021 and data node 2022, rack 3022 is equipped with data node 2023 and data node 2024, and rack 3023 can be equipped with data node 2025.
[0179] Based on this, the control node can determine the weight coefficients of racks 3021, 3022, and 3023 according to the number of data nodes deployed in the rack: 304, 304, and 152, respectively.
[0180] In response to a data scheduling request, control node 201 can determine at least one piece of data to be scheduled corresponding to a target object. In this example, the target object may correspond to two pieces of data to be scheduled, and these two pieces of data to be scheduled may be replicas of the target object. It is assumed that at most one replica of the data is placed in each rack.
[0181] Then, control node 201 can select two target racks from racks 3021, 3022, and 3023 according to the weight coefficients corresponding to racks 3021, 3022, and 3023, and randomly select a data node from each of the two target racks to allocate the two data items to be scheduled. The specific selection method for the target racks is detailed in the previous embodiments and will not be repeated here. In this example, racks 3021 and 3022 can be determined as target racks, so that one data item to be scheduled can be allocated to a data node in rack 3021, and the other data item to be scheduled can be allocated to a data node in rack 3022.
[0182] After allocating the data to be scheduled to the corresponding target rack, control node 201 can reduce the weight coefficient of the target rack based on the amount of data to be scheduled for the target object allocated to that rack. For example, if the weight coefficient of rack 3021 is 304, and the control node allocates one piece of data to rack 3021, the weight coefficient of rack 3021 can be reduced by a fixed target value or target proportion, such as a target value of 2. After allocating one piece of data to rack 3021, the reduced weight coefficient will be 302. Then, in the next allocation of data to be scheduled, rack 3021 will participate in the allocation process with a weight coefficient of 302.
[0183] By attenuating the weight coefficient of the target rack, the probability of the target rack being selected subsequently can be reduced, thereby reducing the probability of data nodes within the target rack being selected. This ensures data balance among data nodes. It prevents a rack from being overloaded with data storage tasks and becoming increasingly unbalanced due to a high initial weight coefficient or being selected multiple times in the early stages.
[0184] Figure 4 This diagram illustrates a data scheduling apparatus according to an embodiment of the present application. The apparatus can be applied to a distributed storage system; the distributed storage system includes multiple data nodes; the multiple data nodes are distributed across multiple racks; the method includes:
[0185] The first determining module 401 is used to determine at least one data to be scheduled for the target object in response to a data scheduling request;
[0186] The second determining module 402 is used to determine the weight coefficients corresponding to the multiple racks respectively; the initial value of the weight coefficient is determined according to the number of data nodes deployed in the rack;
[0187] The third determining module 403 is used to determine at least one target rack to allocate at least one data to be scheduled based on the weight coefficients corresponding to the multiple racks respectively.
[0188] The attenuation module 404 is used to attenuate the weight coefficients corresponding to at least one target rack based on the number of data to be scheduled for the target objects allocated to at least one target rack.
[0189] In some embodiments, the third determining module 403 is specifically used for:
[0190] Select a target rack from the rack set based on the weight coefficients corresponding to multiple racks;
[0191] Select a data node from the target rack to allocate a piece of data to be scheduled;
[0192] If the number of data to be scheduled for the target object allocated to the target rack reaches the storage constraint limit, the selected target rack is deleted from multiple racks to update the rack set. Then, the step of randomly selecting a target rack from the rack set according to the weight coefficients corresponding to the multiple racks is returned and the process continues until at least one data to be scheduled is allocated.
[0193] In some embodiments, the attenuation module 404 may specifically be used for:
[0194] For any target rack, determine the weight coefficient decay value or decay ratio corresponding to the number of data to be scheduled for the target object allocated to the target rack;
[0195] The weight coefficient corresponding to the target rack is attenuated using the weight attenuation value or attenuation ratio.
[0196] In some embodiments, the device further includes:
[0197] The initialization module is used to initialize the weight coefficient of any target rack to its initial value when the weight coefficient of any target rack decays to a preset threshold.
[0198] In some embodiments, the device further includes:
[0199] The anomaly detection module is used to generate a data scheduling request when any copy of the target object is found to be abnormal.
[0200] In some embodiments, the first determining module 401 is specifically used to: determine that the copy data of the target object is data to be scheduled.
[0201] In some embodiments, the distributed storage system provides storage media of multiple media types; at least one media type of storage media is deployed in multiple data nodes respectively.
[0202] In some embodiments, the device further includes:
[0203] The media determination module is used to determine at least two media types corresponding to the target object from a variety of media types.
[0204] Selecting a data node from the target rack to allocate data to be scheduled includes:
[0205] Based on at least two media types, determine the target storage medium corresponding to at least one data to be scheduled, and for any data to be scheduled, select a target data node from the unallocated data nodes in the target rack that provides the target storage medium corresponding to the data to be scheduled.
[0206] In some embodiments, the device further includes:
[0207] The storage module is used to store at least one data to be scheduled into the target storage medium in its respective target data node.
[0208] In some embodiments, the device further includes:
[0209] The request retrieval module is used to retrieve data scheduling requests sent by the service provider;
[0210] In some embodiments, the storage module is specifically used for:
[0211] The system provides feedback to the service provider regarding the target storage medium and target data node corresponding to each of the data to be scheduled, so that the service provider can store the data to be scheduled in the target storage medium of the corresponding target data node.
[0212] Figure 4 The data scheduling device can perform Figure 1 The implementation principle and technical effects of the data scheduling method described in the illustrated embodiments will not be repeated here. The specific methods by which each module and unit of the data scheduling device in the above embodiments performs its operations have been described in detail in the embodiments related to this method, and will not be elaborated upon here.
[0213] This application also provides a computing device, such as... Figure 5 As shown, the device may include a storage component 501 and a processing component 502;
[0214] The storage component 501 stores one or more computer instructions, wherein the one or more computer instructions are called and executed by the processing component 502 to implement the data scheduling method provided in the embodiments of this application.
[0215] Of course, computing devices may also include other components, such as input / output interfaces, display components, communication components, etc.
[0216] Input / output interfaces provide interfaces between processing components and peripheral interface modules, which can be output devices, input devices, etc. Communication components are configured to facilitate wired or wireless communication between computing devices and other devices.
[0217] The processing component may include one or more processors to execute computer instructions to complete all or part of the steps in the above-described method. Alternatively, the processing component may be implemented as one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components to perform the above-described method.
[0218] Storage components are configured to store various types of data to support operations on the terminal. Storage components can be implemented from any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic storage, flash memory, magnetic disk, or optical disk.
[0219] It should be noted that the aforementioned computing devices can be physical devices or elastic computing hosts provided by cloud computing platforms. They can be implemented as a distributed cluster of multiple servers or terminal devices, or as a single server or a single terminal device.
[0220] This application also provides a computer-readable storage medium storing a computer program, which, when executed by a computer, can perform the above-described functions. Figure 1 The data scheduling method of the illustrated embodiment. The computer-readable medium may be included in the electronic device described in the above embodiments; or it may exist independently and not assembled into the electronic device.
[0221] This application also provides a computer program product, which includes a computer program carried on a computer-readable storage medium, and the computer program, when executed by a computer, can perform the above-described functions. Figure 1 The illustrated embodiment describes a data scheduling method. In such an embodiment, the computer program may be downloaded and installed from a network, and / or installed from a removable medium. When the computer program is executed by a processor, it performs various functions defined in the system of this application.
[0222] Those skilled in the art will clearly understand that, for the sake of convenience and brevity, the specific working processes of the systems, devices, and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be repeated here.
[0223] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.
[0224] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.
[0225] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.
Claims
1. A data scheduling method, characterized by, The method is applied to a distributed storage system, wherein the distributed storage system includes multiple data nodes; the multiple data nodes are distributed across multiple racks, and the method includes: In response to a data scheduling request, determine at least one piece of data to be scheduled for the target object; Determine the weighting coefficients corresponding to the plurality of racks respectively; Based on the weight coefficients corresponding to the multiple racks, at least one target rack is determined to allocate the at least one data to be scheduled; Based on the number of data to be scheduled for the target objects allocated to the at least one target rack, the weight coefficients corresponding to the at least one target rack are attenuated.
2. The method of claim 1, wherein, The plurality of racks form a rack set, and determining at least one target rack based on the weight coefficients corresponding to the plurality of racks includes: Based on the weight coefficients corresponding to each rack in the rack set, a target rack is selected from the rack set. Select a data node from the target rack to allocate a piece of data to be scheduled; If the number of data to be scheduled for the target object allocated to the target rack reaches the storage constraint limit, the selected target rack is deleted from the rack set to update the rack set, and the step of randomly selecting a target rack from the rack set according to the weight coefficients corresponding to each rack in the rack set is returned to continue execution until the allocation of at least one data to be scheduled is completed; After the allocation of at least one data to be scheduled is completed, the rack set is restored to the plurality of racks.
3. The method of claim 1, wherein, The step of attenuating the weight coefficients corresponding to the at least one target rack based on the number of data to be scheduled for the target object allocated to the at least one target rack includes: For any target rack, determine the weight coefficient decay value or weight coefficient decay ratio corresponding to the number of data to be scheduled for the target object allocated to the target rack; The weight coefficient corresponding to the target rack is attenuated using the weight coefficient attenuation value or the weight coefficient attenuation ratio.
4. The method according to claim 1 or 3, characterized in that, The initial value of the weighting coefficient is determined based on the number of storage media deployed in the rack; the method further includes: Based on the total amount of data allocated to the multiple racks, calculate the total allocation value corresponding to the multiple racks; The initial values of the weight coefficients corresponding to the multiple racks are calculated, along with their corresponding weight sums. When the ratio of the total allocated value to the sum of the weights reaches a predetermined value, the weight coefficients corresponding to the plurality of racks are updated to the initial values.
5. The method of claim 1, wherein, After determining at least one target rack to allocate the at least one data to be scheduled based on the weight coefficients corresponding to the plurality of racks, the method includes: Determine whether the number of data to be scheduled for the target object allocated to any target rack exceeds the storage constraint limit; If so, based on the initial values of the weight coefficients corresponding to the multiple racks respectively, at least one target rack is re-determined to allocate the at least one data to be scheduled; If not, perform the operation of attenuating the weight coefficients corresponding to the at least one target rack based on the number of data to be scheduled for the target object allocated to the at least one target rack.
6. The method of claim 5, wherein, The determination of whether the number of scheduled data for the target object allocated to any target rack exceeds the storage constraint includes: If the weight coefficient of any rack decays to a preset threshold, determine whether the number of scheduled data of the target object allocated to any target rack exceeds the storage constraint.
7. The method of claim 1, wherein, Also includes: If any copy of the target object is detected to be abnormal, the data scheduling request is generated. The at least one data to be scheduled for determining the target object includes: The copy data of the target object is determined to be the data to be scheduled.
8. The method of claim 2, wherein, The distributed storage system provides storage media of multiple media types; at least one media type of storage media is deployed in each of the multiple data nodes. The method further includes: From the multiple media types, at least two media types corresponding to the target object are determined; The step of selecting a data node from the target rack to allocate a piece of data to be scheduled includes: Based on the at least two media types, determine the target storage medium corresponding to each of the at least one data to be scheduled, and for any data to be scheduled, select a target data node from the unallocated data nodes in the target rack that provides the target storage medium corresponding to the data to be scheduled.
9. The method of claim 1, wherein, Also includes: The at least one data to be scheduled is stored in the target storage medium of its corresponding target data node.
10. The method of claim 9, wherein, Also includes: Obtain the data scheduling request sent by the service provider; The step of storing the at least one data to be scheduled into the target storage medium in its corresponding target data node includes: The service provider shall provide feedback to the service provider regarding the target storage medium and target data node corresponding to each of the at least one data to be scheduled, so that the service provider may store the at least one data to be scheduled in the target storage medium of the corresponding target data node.
11. A distributed storage system, characterized by, The distributed storage system includes a control node and multiple data nodes; the multiple data nodes are distributed across multiple racks. The control node is used to determine at least one data to be scheduled for the target object in response to a data scheduling request; Determine the weighting coefficients corresponding to the plurality of racks respectively; The initial value of the weighting coefficient is determined based on the number of data nodes deployed in the rack; based on the weighting coefficients corresponding to the multiple racks, at least one target rack is determined to allocate the at least one data to be scheduled. Based on the number of data to be scheduled for the target objects allocated to the at least one target rack, the weight coefficients corresponding to the at least one target rack are attenuated respectively; The data node is used to store any allocated data to be scheduled into the corresponding storage medium.
12. A computing device, comprising: This includes processing components and storage components; The storage component stores one or more computer instructions; the one or more computer instructions are invoked and executed by the processing component to implement the data scheduling method as described in any one of claims 1 to 10.
13. A computer-readable storage medium, characterized in that, It stores a computer program, which, when executed by a processing component, implements the data scheduling method as described in any one of claims 1 to 10.
14. A computer program product, characterized in that, Includes a computer program / instruction that, when executed by a processing component, implements the data scheduling method as described in any one of claims 1 to 10.