Disk processing method, distributed storage system, device and storage medium

By monitoring the health status of disks in the distributed storage system, constructing a logical-physical address mapping, and synchronizing data to backup disks, the impact of removing sub-healthy disks on business operations is resolved, achieving silent replacement and stable performance.

CN122309253APending Publication Date: 2026-06-30BEIJING LINX SOFTWARE CORP

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING LINX SOFTWARE CORP
Filing Date
2026-04-13
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In existing technologies, removing sub-optimal disks from a distributed storage system can affect process execution and cause data imbalance, consume network bandwidth and computing resources, and impact the performance and continuity of upper-layer services.

Method used

By monitoring the health status of disks, a target backup disk for sub-healthy disks is identified, and a mapping relationship between logical addresses and physical addresses is established. Data is synchronized to the backup disk, address mapping information is updated, IO requests are handled in a targeted manner, and the sub-healthy disk is removed after data synchronization is completed.

Benefits of technology

It enables silent replacement of unhealthy disks without affecting the continuity of upper-layer business, avoiding data reconstruction operations and ensuring the stable performance of the distributed storage system.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309253A_ABST
    Figure CN122309253A_ABST
Patent Text Reader

Abstract

This application discloses a disk processing method, a distributed storage system, a device, and a storage medium. The method includes: analyzing the health status of each disk based on its operating parameters; if a disk is determined to be in a sub-healthy state, determining a target backup disk based on the attribute information of the sub-healthy disk; constructing a mapping relationship between the logical address of the sub-healthy disk and the physical address of the target backup disk; updating the address mapping information according to the mapping relationship; synchronizing the data stored in the sub-healthy disk to the target backup disk; and performing targeted processing on the IO requests corresponding to the sub-healthy disk according to the updated address mapping information; and removing the sub-healthy disk from the distributed storage system after data synchronization is complete. This method enables silent disk replacement before the sub-healthy disk is removed, avoiding impact on process and upper-layer business continuity, and avoiding impact on the performance of the distributed storage system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of computer technology, and in particular to a disk processing method, a distributed storage system, a device, and a storage medium. Background Technology

[0002] With the development of cloud computing and big data technologies and the widespread application of distributed storage systems, the cluster size and data volume of distributed storage systems continue to grow, placing higher demands on the availability, performance, and stability of distributed storage systems. As the fundamental storage medium of distributed storage systems, the health of disks has a significant impact on the stable operation of the distributed system cluster. In actual operation, disks experiencing health problems often do not fail suddenly, but rather go through a "sub-healthy" phase, manifested as increased read / write latency, higher IO error rates, and fluctuating response times.

[0003] Currently, distributed systems often treat unhealthy disks like faulty disks by directly marking them as faulty and removing them from the cluster. However, removing unhealthy disks can disrupt the processes managing them and lead to data imbalances within the cluster. This can trigger large-scale rebalancing and cross-node migrations, severely consuming the network bandwidth and computing resources of the distributed storage system. Consequently, it can reduce the performance of upper-layer services, increase response latency, and affect the continuity of business operations. Summary of the Invention

[0004] In view of the above-mentioned defects or deficiencies in the prior art, it is desirable to provide a disk processing method, distributed storage system, device and storage medium that can achieve silent disk replacement before removing unhealthy disks, so as to avoid affecting the continuity of processes and upper-layer services and avoid affecting the performance of distributed storage system.

[0005] Firstly, this application provides a disk processing method applied to a distributed storage system, the distributed storage system including multiple storage nodes, each storage node deploying multiple disks. The method includes: The health status of each disk is analyzed based on its operating parameters to determine its health status. If a disk is classified as sub-healthy, its target backup disk is determined based on its attribute information. The logical address of the sub-healthy disk is determined based on the address mapping information. A mapping relationship between the logical address and the physical address of the target backup disk is constructed, and the address mapping information is updated according to the mapping relationship. The address mapping information includes the mapping relationship between multiple logical addresses and the physical address of the disk. Synchronize the data stored in the sub-healthy disk to the target backup disk, and perform targeted processing on the IO requests corresponding to the sub-healthy disk according to the updated address mapping information; after the data synchronization is completed, remove the sub-healthy disk from the distributed storage system; targeted processing includes processing by the sub-healthy disk or by the target backup disk.

[0006] In conjunction with the first aspect, in one possible implementation, the IO requests corresponding to the sub-healthy disk are targeted based on the updated address mapping information, including: targeting the IO requests based on the data synchronization status, the request type of the IO request, and the updated address mapping information.

[0007] In conjunction with the first aspect, in one possible implementation, the IO request is directed based on the data synchronization status, the IO request type, and the updated address mapping information. This includes: if the data synchronization status indicates that the data stored on the unhealthy disk has been synchronized on the backup disk, or the IO request type is a write request, then the IO request is directed to the target backup disk for processing according to the mapping relationship in the updated address mapping information; if the data synchronization status indicates that the data stored on the unhealthy disk has not been synchronized on the backup disk, and the IO request type is a read request, then the IO request is directed to the unhealthy disk for processing according to the updated address mapping information.

[0008] In conjunction with the first aspect, in one possible implementation, the disk processing method further includes: if the data stored in the sub-healthy disk, which indicates the data synchronization status, has not been synchronized in the backup disk, and the IO request type is a read request, then the IO request is directed to the sub-healthy disk for processing according to the updated address mapping information, and synchronization is started from the data corresponding to the read request for the remaining data in the sub-healthy disk that has not been synchronized.

[0009] In conjunction with the first aspect, in one possible implementation, determining the target backup disk for the sub-healthy disk based on the attribute information of the sub-healthy disk includes: obtaining attribute information of multiple backup disks included in the storage node where the sub-healthy disk is located; the attribute information includes at least one of capacity and media type; and determining the backup disk whose attribute information matches that of the sub-healthy disk as the target backup disk.

[0010] In conjunction with the first aspect, in one possible implementation, the health status of the disks is analyzed based on the operating parameters of each disk to determine the health status of the disks, including: obtaining the operating parameters of each disk; if the operating parameters are within a pre-set sub-healthy range, then the disk is determined to be in a sub-healthy state and is a sub-healthy disk.

[0011] Secondly, this application also provides a distributed storage system. This distributed storage system includes multiple storage nodes, each storage node is equipped with multiple disks, and an operating system runs on each storage node. The operating system includes a kernel and a user space, and the kernel includes a virtual bus layer. The user space is used to analyze the health status of disks based on their operating parameters and determine their health status. If a disk is classified as sub-healthy, the user space determines the target backup disk based on the sub-healthy disk's attribute information and generates a disk replacement command based on the target backup disk, which is then sent to the virtual bus layer. The virtual bus layer, in response to disk replacement commands, determines the logical address of the sub-healthy disk based on address mapping information, constructs a mapping relationship between the logical address and the physical address of the target backup disk, updates the address mapping information according to the mapping relationship, synchronizes the data stored in the sub-healthy disk to the target backup disk, and performs targeted processing on the IO requests corresponding to the sub-healthy disk according to the updated address mapping information. After completing data synchronization, the sub-healthy disk is removed from the distributed storage system. The address mapping information includes the mapping relationship between multiple logical addresses and the physical addresses of the disks.

[0012] In conjunction with the second aspect, in one possible implementation, the virtual bus layer is also used to obtain the physical addresses of each disk in the storage node, create logical addresses of the physical addresses, construct address mapping information based on the mapping relationship between the physical addresses and the logical addresses, and send the address mapping information to the user space.

[0013] Thirdly, this application also provides a computer device. The computer device includes a memory and a processor, the memory storing a computer program, and the processor executing the computer program to implement the method described in the first aspect.

[0014] Fourthly, this application also provides a computer-readable storage medium. This computer-readable storage medium stores a computer program thereon, which, when executed by a processor, implements the method described in the first aspect.

[0015] Fifthly, this application also provides a computer program product. This computer program product includes a computer program that, when executed by a processor, implements the method described in the first aspect.

[0016] The disk processing method, distributed storage system, device, and storage medium provided in this application can monitor the health status of disks on each storage node in the distributed storage system. When a disk is in a sub-healthy state, a target backup disk is determined based on the attribute information of the sub-healthy disk, and a mapping relationship is constructed between the logical address of the sub-healthy disk and the physical address of the target backup disk, updating the address mapping information. Then, the data stored in the sub-healthy disk is synchronized to the target backup disk, and the IO requests corresponding to the sub-healthy disk are processed in a targeted manner according to the updated address mapping information. After data synchronization is completed, the sub-healthy disk is removed from the distributed storage system. When a disk is detected to be in a sub-healthy state, the method provided in this application can use a backup disk to replace the sub-healthy disk. During the replacement process, the sub-healthy disk and the backup disk can be controlled to cooperate to continue executing the corresponding processes and responding to the corresponding IO requests, avoiding the impact of disk replacement on the continuity of upper-layer services, and achieving silent disk replacement (i.e., disk replacement without the upper-layer services noticing). Moreover, by connecting the backup disk and completing data synchronization before removing the sub-healthy disk, the consistency of data distribution in the distributed storage system can be ensured, avoiding data reconstruction operations and thus avoiding impact on the performance of the distributed storage system. Attached Figure Description

[0017] Other features, objects, and advantages of this application will become more apparent from the following detailed description of non-limiting embodiments with reference to the accompanying drawings: Figure 1 This is a schematic diagram of the architecture of a distributed storage system in one embodiment; Figure 2 This is a flowchart illustrating a disk processing method in one embodiment; Figure 3 This is another flowchart illustrating the disk processing method in one embodiment; Figure 4 This is another flowchart illustrating the disk processing method in one embodiment; Figure 5 This is another flowchart illustrating the disk processing method in one embodiment; Figure 6 This is an internal structural diagram of a computer device in one embodiment. Detailed Implementation

[0018] The present application will now be described in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and not intended to limit it. Furthermore, it should be noted that, for ease of description, only the parts relevant to the invention are shown in the accompanying drawings.

[0019] It should be noted that, unless otherwise specified, the embodiments and features described in this application can be combined with each other. The present application will now be described in detail with reference to the accompanying drawings and embodiments. Furthermore, the term "and / or" in this document is merely a description of the relationship between related objects, indicating that three relationships can exist. For example, A and / or B can represent: A existing alone, A and B existing simultaneously, or B existing alone. The terms "first" and "second," etc., in the specification and claims of the embodiments of this application are used to distinguish different objects, not to describe a specific order of objects.

[0020] With the development of cloud computing and big data technologies and the widespread application of distributed storage systems, the cluster size and data volume of distributed storage systems continue to grow, placing higher demands on the availability, performance, and stability of distributed storage systems. As the fundamental storage medium of distributed storage systems, the health of disks has a significant impact on the stable operation of the distributed storage system cluster. In actual operation, disks experiencing health problems often do not fail suddenly, but rather go through a "sub-healthy" phase, manifested as increased read / write latency, higher IO error rates, and fluctuating response times.

[0021] Currently, distributed storage systems often handle sub-healthy disks by directly marking them as faulty and removing them from the cluster. However, removing sub-healthy disks has several drawbacks. First, it disrupts the processes managing these disks, causing processes like the Object Storage Device (OSD) daemon to be removed from the cluster. Second, the loss of storage media (data on sub-healthy disks) can lead to data imbalance within the cluster, triggering a rebalancing mechanism that causes large-scale data migration across nodes, severely consuming network bandwidth and computing resources. Furthermore, the heavy use of underlying resources degrades upper-layer performance, resulting in decreased read / write speeds, drastic response fluctuations, or stuttering, impacting the continuity of cluster operations. Moreover, data rebalancing can be lengthy, lasting hours or even days, leading to prolonged periods of low performance and affecting numerous business processes. Finally, sub-healthy disks are not completely faulty and often retain some capabilities, such as read functionality; directly removing them results in low disk utilization.

[0022] Based on this, embodiments of this application provide a disk processing method, a distributed storage system, a device, and a storage medium, which can achieve silent disk replacement before removing unhealthy disks, avoiding impact on the continuity of processes and upper-layer services, and avoiding impact on the performance of the distributed storage system.

[0023] The disk processing method provided in this application embodiment can be applied to, for example, Figure 1 The distributed storage system shown includes multiple storage nodes 10, which can be servers, virtual machines, clients, etc. Regarding the internal structure of each storage node 10 (shown in the dashed box in the figure), each storage node 10 has multiple disks 20 deployed for data storage. Each storage node 10 runs an operating system 30, which includes a kernel 31 and a user space 32. The kernel 31 of the operating system 30 can be configured with a virtual bus layer 33 for replacing unhealthy disks. The disks 20 can be hard disk drives (HDDs), solid-state drives (SSDs), etc.; the operating system 30 can be a Linux operating system.

[0024] In one embodiment, such as Figure 2 As shown, a disk processing method is provided, which is applied to... Figure 1 The following steps are used as an example of a distributed storage system: Step 101: Analyze the health status of each disk based on its operating parameters to determine its health status; if the disk is classified as sub-healthy, determine the target backup disk based on its attribute information.

[0025] In this embodiment, for each storage node 10, the user space 32 of the operating system 30 running on it can monitor the health status of each disk deployed locally on the storage node 10. Specifically, it can collect the operating parameters of each disk at a preset sampling frequency, such as SMART information, IO latency, remapped sectors, bit error rate, etc. Then, for each operating parameter, it is compared with the pre-configured range of operating parameters when the disk is in a sub-healthy state (which can be called the sub-healthy range corresponding to the operating parameter). If the operating parameters of the disk are within the corresponding sub-healthy range, the disk is determined to be in a sub-healthy state, and the disk is a sub-healthy disk.

[0026] In one possible implementation, weights can be assigned to each operating parameter based on their impact on disk health; the weights are positively correlated with the degree of impact. Then, a health status score is determined for each operating parameter based on the difference between it and its corresponding sub-health range (an upper limit, lower limit, or median value that characterizes the sub-health range); the health status score is positively correlated with the difference, meaning the smaller the difference, the greater the likelihood that the disk is in a sub-healthy state, and the lower the health status score should be. Next, the health status scores of multiple operating parameters are weighted and summed according to their respective weights to obtain the disk's corresponding health status score. Based on the comparison between the disk's corresponding health status score and a preset health threshold, the disk's health status can be determined. For example, if the disk's health status score is less than the preset health threshold, the disk can be determined to be in a sub-healthy state, and is thus considered a sub-healthy disk.

[0027] If the disk is determined to be in a sub-healthy state based on the above health status analysis, a backup disk, i.e., a target backup disk, needs to be identified to replace the sub-healthy disk. Preferably, a backup disk with the same or similar attribute information can be selected as the target backup disk corresponding to the sub-healthy disk. This ensures that after disk replacement, processes related to the sub-healthy disk are not affected by the target backup disk and can still execute efficiently and stably. The attribute information may include the disk's capacity, media type, etc.; the media type can include, for example, selecting a backup disk with a capacity greater than or equal to the sub-healthy disk's capacity and the same media type as the sub-healthy disk as the target backup disk.

[0028] Step 102: Determine the logical address of the sub-healthy disk based on the address mapping information, construct the mapping relationship between the logical address and the physical address of the target backup disk, and update the address mapping information according to the mapping relationship.

[0029] The address mapping information is generated by the kernel 31 of the operating system 30, which identifies the local disk 20 of storage node 10, abstracts the physical address of disk 20 into a logical address, constructs the mapping relationship between the physical address and the logical address of disk 20, and stores it locally. In one possible implementation, a virtual handle (e.g., / dev / vbl) corresponding to the physical address of the disk can also be created. Address mapping information is constructed based on the mapping relationship between physical addresses and virtual handles. As long as it is ensured that disk-related processes can access the disk through the address mapping information, it is sufficient.

[0030] In this embodiment, after determining the sub-healthy disk and its corresponding target backup disk, user space 32 first connects the target backup disk to the program instance to which the sub-healthy disk belongs, so that the upper-layer process can access the target backup disk. Specifically, user space 32 can generate a disk access command based on relevant information of the target backup disk (such as identifier, physical address, etc.) and send the disk access command to kernel 31. In response to the disk access command, kernel 31 first queries the address mapping information according to the physical address of the sub-healthy disk to determine the logical address of the sub-healthy disk, then constructs a mapping relationship between the physical address of the target backup disk and the logical address of the sub-healthy disk, and adds this mapping relationship to the address mapping information to complete the update of the address mapping information.

[0031] Understandably, the updated address mapping information retains the mapping relationship between the physical and logical addresses of the sub-healthy disks, while also maintaining the mapping relationship between the physical address of the newly added target backup disk and the logical address of the sub-healthy disk. In other words, the updated address mapping information includes the mapping relationships between the same logical address and the physical address of the target backup disk, as well as the mapping relationship between the logical address and the physical address of the sub-healthy disk.

[0032] In one possible implementation, an identifier or priority can be added to the newly added mapping relationship (the mapping relationship between the physical address of the target backup disk and the logical address of the sub-healthy disk) to distinguish the mapping relationship between the sub-healthy disk and the target backup disk.

[0033] Step 103: Synchronize the data stored in the sub-healthy disk to the target backup disk, and perform targeted processing on the IO requests corresponding to the sub-healthy disk according to the updated address mapping information; after completing the data synchronization, remove the sub-healthy disk from the distributed storage system.

[0034] In this embodiment, after determining the target backup disk corresponding to the sub-healthy disk, the data stored in the sub-healthy disk needs to be synchronized to the target backup disk to ensure the effectiveness of the target backup disk. In one possible implementation, the kernel 31 can construct a memory bitmap to record the data synchronization progress. Specifically, the kernel 31 can sequentially scan the sub-healthy disk, mirror the data stream to the target backup disk, and modify the corresponding bit in the memory bitmap after a certain amount of data synchronization is completed to indicate that the synchronization of that portion of the data is complete.

[0035] During data synchronization, to prevent processes related to unhealthy disks from being affected by disk replacement and thus impacting the continuity of upper-layer services, it is also necessary to respond to the I / O requests generated by these processes. Referring to the address mapping information, I / O request responses can be directed to two disks (the unhealthy disk and the target backup disk). In other words, directed processing can be handled by either the unhealthy disk or the target backup disk. To ensure accurate I / O request responses, it is necessary to identify the disk capable of responding to I / O requests and then direct the response to the corresponding disk based on the mapping relationship in the address mapping information.

[0036] One possible implementation involves directing I / O requests based on data synchronization status, the type of I / O request, and updated address mapping information. This approach considers that during data synchronization, the target backup disk may not possess all the data from the unhealthy disk, thus preventing it from responding to all types of I / O requests. For example, a target backup disk containing only partial data cannot respond to partial data read requests. Only after data synchronization is complete can the target backup disk respond to all I / O requests corresponding to the unhealthy disk. In other words, determining the disk to respond to I / O requests requires considering the data synchronization status between the target backup disk and the unhealthy disk, as well as the type of I / O request. Once the disk is determined, the I / O request is directed to the determined disk for response based on the updated address mapping information.

[0037] In this embodiment, after data synchronization is complete, all IO requests corresponding to the sub-healthy disk can be redirected to the target backup disk for response, and the sub-healthy disk can be removed from the distributed storage system. At this point, removing the sub-healthy disk will not affect the execution of the process corresponding to the sub-healthy disk, nor will it affect the continuity of upper-layer services. Furthermore, since the target backup disk has already been connected to the process instance and already contains all the data stored on the sub-healthy disk, it will not lead to data reconstruction in the distributed storage system and will not affect the performance of the distributed storage system. Additionally, the mapping relationship between the physical address and logical address of the sub-healthy disk in the address mapping information also needs to be deleted to ensure the uniqueness of the target backup disk.

[0038] The method provided in this application can monitor the health status of disks on each storage node in a distributed storage system. When a disk is in a sub-healthy state, a target backup disk is determined based on the attribute information of the sub-healthy disk, and a mapping relationship is constructed between the logical address of the sub-healthy disk and the physical address of the target backup disk, updating the address mapping information. Then, the data stored in the sub-healthy disk is synchronized to the target backup disk, and the IO requests corresponding to the sub-healthy disk are processed in a targeted manner according to the updated address mapping information. After data synchronization is completed, the sub-healthy disk is removed from the distributed storage system. When a disk is detected to be in a sub-healthy state, the method provided in this application can use a backup disk to replace the sub-healthy disk. During the replacement process, the sub-healthy disk and the backup disk can be controlled to cooperate to continue executing the corresponding processes and responding to the corresponding IO requests, avoiding the impact of disk replacement on the continuity of upper-layer services, and achieving silent disk replacement (i.e., disk replacement without the upper-layer services noticing). Moreover, by connecting to the backup disk and completing data synchronization before removing the sub-healthy disk, the consistency of data distribution in the distributed storage system can be ensured, avoiding data reconstruction operations and thus avoiding impact on the performance of the distributed storage system.

[0039] The preceding embodiments described a scheme for targeted processing of IO requests. In another embodiment of this application, the disk responding to the IO request can be determined by referring to data synchronization status and IO request type, and the IO request can be targeted for processing. For example, the aforementioned "targeted processing of IO requests based on data synchronization status, IO request type, and updated address mapping information" may include, for example... Figure 3 The steps shown are as follows: Step 201: If the data stored in the disk indicating sub-healthy data synchronization status is synchronized in the standby disk, or if the IO request type is a write request, then the IO request is directed to the target standby disk for processing according to the mapping relationship in the updated address mapping information.

[0040] In this embodiment, considering that all data stored in the sub-healthy disk is synchronized to the target backup disk, the target backup disk can respond to all IO requests corresponding to the sub-healthy disk. Considering that the IO request is for writing data to the disk, i.e., the request type is a write request, there is no need to consider the data synchronization progress of the target backup disk.

[0041] Therefore, for the two scenarios where data synchronization is completed on the backup disk and the IO request type is write, the IO request can be directly directed to the target backup disk for processing. This avoids directing processing to the unhealthy disk, which could lead to data loss or require resynchronizing newly written data to the target backup disk, impacting the computing resources of the storage node or distributed storage system.

[0042] Specifically, based on the identifiers (or whether or not there are identifiers) of the two mapping relationships corresponding to the logical addresses of the sub-healthy disks in the address mapping information, the mapping relationship between the required logical address and the physical address of the target backup disk can be filtered out, and then the physical address of the target backup disk can be accessed so that the target backup disk can respond to IO requests.

[0043] Step 202: Data synchronization status indicates that the data stored in the sub-healthy disk has not been synchronized in the backup disk, and the IO request type is a read request. Then, the IO request is directed to the sub-healthy disk for processing according to the updated address mapping information.

[0044] In this embodiment, considering that during data synchronization, the target backup disk does not contain all the data stored on the unhealthy disk, when the IO request is to read data from the disk (i.e., the request type is a read request), there is a possibility that the target backup disk does not have the data required for the IO request, meaning that the target backup disk may be unable to respond to the IO request. However, considering that the unhealthy disk has not yet been removed and contains all the data, this application takes into account that the unhealthy disk is currently not removed and contains all the data.

[0045] Therefore, in cases where data synchronization in a sub-healthy disk has not been completed on a backup disk, and the IO request type is a read request, the IO request can be directed to the sub-healthy disk for processing. Specifically, based on the identifiers (or whether they exist) of the two mapping relationships corresponding to the logical address of the sub-healthy disk in the address mapping information, the required mapping relationship between the logical address and the physical address of the sub-healthy disk can be filtered out. Then, the physical address of the sub-healthy disk can be accessed, enabling the sub-healthy disk to respond to the IO request.

[0046] In one possible implementation, during data synchronization, for I / O requests of type read, it can be first determined from the memory bitmap whether the data to be read by the I / O request has already been synchronized to the target backup disk. If it has been synchronized, the I / O request can be directed to the target backup disk for processing.

[0047] Furthermore, considering factors such as data attributes, the data corresponding to a read request is more likely to be read again compared to other data. Therefore, to improve the reliability of I / O request response, the disk processing method may also include the following steps: The data synchronization status indicates that the data stored in the sub-healthy disk has not been synchronized in the backup disk, and the IO request type is a read request. In this case, the IO request is directed to the sub-healthy disk for processing according to the updated address mapping information, and synchronization is started from the data corresponding to the read request for the remaining data in the sub-healthy disk that has not been synchronized.

[0048] In other words, for data on a sub-healthy disk that has not completed data synchronization and has been read once, considering the high probability that this data will be read again, to avoid the read request being directed to the sub-healthy disk again when responding to a read request for that data, during the data synchronization process, when the sub-healthy disk receives a read request (or completes a response to a read request), it can prioritize synchronizing the data corresponding to that read request from the data that has not been synchronized. This allows subsequent read requests for that data to be directed to a safe and healthy backup disk, thereby improving data security and the reliability of IO request responses.

[0049] The method provided in this application embodiment can determine the disk that can effectively respond to IO requests based on the data synchronization status and the request type of IO requests, and direct the IO requests to the corresponding disks for processing according to the updated address mapping information, thereby ensuring the continuity of process execution of the sub-healthy disks during the data synchronization process and avoiding impact on upper-layer services.

[0050] The embodiments described above illustrate a scheme for determining a target backup disk for replacing a sub-healthy disk. In another embodiment of this application, the target backup disk can be determined based on the degree of matching of attribute parameters. For example, the aforementioned "determining the target backup disk for a sub-healthy disk based on the attribute information of the sub-healthy disk" may include, for example... Figure 4 The steps shown are as follows: Step 301: Obtain the attribute information of multiple backup disks included in the storage node where the sub-healthy disk is located.

[0051] The attribute information includes at least one of capacity and media type.

[0052] In this embodiment, multiple spare disks can be reserved for each storage node when constructing the distributed storage system. When user space 32 determines that there is a sub-optimal disk in its storage node, it can obtain the attribute information of at least one of the multiple spare disks included in the storage node and perform attribute information matching. Preferably, the attribute information of the multiple spare disks included in the storage node containing the sub-optimal disk can be obtained, and a target spare disk can be matched from them to avoid data synchronization across nodes. This ensures that data is processed within the SATA / SAS / NVMe bus within the storage node, without consuming the network bandwidth of the distributed storage system and avoiding impacting the performance of the distributed storage system.

[0053] Step 302: Among the multiple backup disks, the backup disk whose attribute information matches the attribute information of the sub-healthy disk is identified as the target backup disk.

[0054] In this embodiment, matching rules can be set based on the attribute information of the sub-healthy disk, such as a capacity greater than or equal to the capacity of the sub-healthy disk, or a media type that is the same as the media type of the sub-healthy disk. Among multiple backup disks, the backup disk whose attribute information matches the matching rules is determined as the target backup disk. Preferably, the matching rule can be a capacity greater than or equal to the capacity of the sub-healthy disk, and a media type that is the same as the media type of the sub-healthy disk. Alternatively, the matching rule can be a capacity greater than or equal to the capacity of the sub-healthy disk, or a capacity difference between the sub-healthy disk and its capacity that is less than a preset difference (without restricting the media type).

[0055] In one possible implementation, if no target backup disk is matched, an alarm operation can be performed, such as displaying alarm information on the client screen or generating an alarm report and sending it to the operations and maintenance personnel's email address, so as to deal with the sub-healthy disk in a timely manner.

[0056] The method provided in this application embodiment can obtain attribute information of multiple backup disks included in the storage node where the sub-healthy disk is located. Among the multiple backup disks, the backup disk whose attribute information matches that of the sub-healthy disk is determined as the target backup disk. This application embodiment can determine backup disks with the same or similar attribute information as the target backup disk corresponding to the sub-healthy disk, so that after disk replacement, the processes related to the sub-healthy disk are not affected by the target backup disk and can still execute efficiently and stably.

[0057] The embodiments described above introduce a scheme for analyzing disk health status. In another embodiment of this application, the disk's state of sub-health can be analyzed based on the comparison results of operating parameters with preset value ranges. For example, the aforementioned "analyzing the health status of disks based on their operating parameters to determine their health status" may include, for instance,... Figure 5 The steps shown are as follows: Step 401: Obtain the operating parameters of each disk.

[0058] Step 402: If the running parameters are within the preset sub-health range, the disk is determined to be in a sub-healthy state and is a sub-healthy disk.

[0059] The sub-health range represents the range of operating parameters corresponding to a disk being in a sub-healthy state. This range can be pre-set by professionals and stored locally on the storage node.

[0060] In this embodiment, user space 32 can collect operating parameters of each disk deployed on the storage node, such as SMART information, IO latency, remapped sectors, and bit error rate. Then, for each operating parameter, it compares it with a corresponding sub-health range. If any operating parameter falls within the corresponding sub-health range, the disk is determined to be in a sub-healthy state and is designated as a sub-healthy disk. Alternatively, if the number of operating parameters within the corresponding sub-healthy range is greater than or equal to a preset number, the disk is determined to be in a sub-healthy state and is designated as a sub-healthy disk.

[0061] The method provided in this application embodiment can analyze the health status of a disk based on the comparison results between the disk's operating parameters and the corresponding pre-set sub-health range, and can accurately and efficiently determine the sub-healthy disk.

[0062] In one embodiment, such as Figure 1 As shown, a distributed storage system is provided, comprising multiple storage nodes 10, each storage node 10 deploying multiple disks 20, and an operating system 30 running on each storage node 10. The operating system 30 includes a kernel 31 and a user space 32, and the kernel 31 includes a virtual bus layer 33. The user space 32 is used to analyze the health status of each disk 20 based on its operating parameters to determine the health status of the disks 20; if a disk 20 is classified as sub-healthy, a target backup disk is determined based on the attribute information of the sub-healthy disk; and a disk replacement instruction is generated based on the target backup disk and sent to the virtual bus layer 33. Virtual bus layer 33 is used to respond to disk replacement commands by determining the logical address of the sub-healthy disk based on address mapping information, constructing a mapping relationship between the logical address and the physical address of the target backup disk, updating the address mapping information according to the mapping relationship, synchronizing the data stored in the sub-healthy disk to the target backup disk, and performing targeted processing on the IO requests corresponding to the sub-healthy disk according to the updated address mapping information. After completing the data synchronization, the sub-healthy disk is removed from the distributed storage system. The address mapping information includes the mapping relationship between multiple logical addresses and the physical address of the disk.

[0063] In this embodiment, the virtual bus layer 33 is further configured to obtain the physical addresses of each disk 20 in the storage node 10, create logical addresses of the physical addresses, construct address mapping information based on the mapping relationship between the physical addresses and the logical addresses, and send the address mapping information to the user space 32. This allows the user space 32 to determine the logical address of the sub-healthy disk and its corresponding target backup disk based on the address mapping information, and generate a disk replacement instruction to instruct the virtual bus layer 33 to connect the target backup disk to the virtual bus layer 33 (i.e., construct the mapping relationship between the logical address of the sub-healthy disk and the physical address of the target backup disk, and update the address mapping information).

[0064] That is, in this embodiment, a virtual bus layer 33 and its driver module are added to the kernel space of storage node 10, and the virtual bus layer 33 is configured to build and manage address mapping information, i.e., the virtual bus layer 33 is configured to manage the address mapping information. Thus, when the kernel program executes the driver module of the virtual bus layer 33, the virtual bus layer 33 is started, and the virtual bus layer 33 performs the operations of the kernel 31 in the above embodiment (i.e., connecting the target spare disk to the instance, controlling the data synchronization between the sub-healthy disk and the target spare disk, directing the IO requests corresponding to the sub-healthy disk, removing the sub-healthy disk, and removing the mapping relationship corresponding to the sub-healthy disk in the address mapping information, etc.).

[0065] In this system, the virtual bus layer 33 can be a block device. When there are no unhealthy disks in storage node 10 of the distributed storage system, the virtual bus layer 33 of storage node 10 can maintain a "pass-through mode," meaning it directs IO requests to the corresponding disk based on address mapping information. When there are unhealthy disks in storage node 10 of the distributed storage system, the virtual bus layer 33 of storage node 10 can switch to a "migration mode." In response to a disk replacement command, it determines the logical address of the unhealthy disk based on the address mapping information, establishes a mapping relationship between the logical address and the physical address of the target backup disk, updates the address mapping information according to the mapping relationship, synchronizes the data stored in the unhealthy disk to the target backup disk, and directs IO requests corresponding to the unhealthy disk according to the updated address mapping information. After data synchronization is complete, the unhealthy disk is removed from the distributed storage system. After the unhealthy disk is removed, the virtual bus layer 33 can switch back to the "pass-through mode."

[0066] The distributed storage system provided in this application embodiment can monitor the health status of disks on each storage node. When a disk is in a sub-healthy state, a target backup disk is determined based on the attribute information of the sub-healthy disk, and a mapping relationship is constructed between the logical address of the sub-healthy disk and the physical address of the target backup disk, updating the address mapping information. Then, the data stored in the sub-healthy disk is synchronized to the target backup disk, and the IO requests corresponding to the sub-healthy disk are processed in a targeted manner according to the updated address mapping information. After data synchronization is completed, the sub-healthy disk is removed from the distributed storage system. When the distributed storage system provided in this application embodiment detects that a disk is in a sub-healthy state, it can use a backup disk to replace the sub-healthy disk. During the replacement process, the sub-healthy disk and the backup disk can be controlled to cooperate to continue executing the corresponding processes and responding to the corresponding IO requests, avoiding the impact of disk replacement on the continuity of upper-layer services, and realizing silent disk replacement (i.e., disk replacement without the upper-layer services noticing). Moreover, by connecting the backup disk and completing data synchronization before removing the sub-healthy disk, the consistency of data distribution in the distributed storage system can be ensured, avoiding data reconstruction operations and thus avoiding impact on the performance of the distributed storage system.

[0067] It should be noted that although the operations of the method of the present invention are described in a specific order in the accompanying drawings, this does not require or imply that these operations must be performed in that specific order, or that all of the operations shown must be performed to achieve the desired result. On the contrary, the steps depicted in the flowchart may be performed in a different order. Additionally or alternatively, certain steps may be omitted, multiple steps may be combined into one step, and / or one step may be broken down into multiple steps.

[0068] The following is for reference. Figure 6 It shows a schematic diagram of the structure of a computer system 500 suitable for implementing terminal devices or servers in the embodiments of this application.

[0069] like Figure 6 As shown, the computer system 500 includes a central processing unit (CPU) 501, which can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) 502 or programs loaded from storage section 508 into random access memory (RAM) 503. The RAM 503 also stores various programs and data required for the operation of the system 500. The CPU 501, ROM 502, and RAM 503 are interconnected via a bus 504. An input / output (I / O) interface 505 is also connected to the bus 504.

[0070] The following components are connected to I / O interface 505: an input section 506 including a keyboard, mouse, etc.; an output section 507 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 508 including a hard disk, etc.; and a communication section 509 including a network interface card such as a LAN card, modem, etc. The communication section 509 performs communication processing via a network such as the Internet. A drive 510 is also connected to I / O interface 505 as needed. A removable medium 511, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., is installed on drive 510 as needed so that computer programs read from it can be installed into storage section 508 as needed.

[0071] In particular, according to embodiments of this disclosure, the above references Figure 2 The described process can be implemented as a computer software program. For example, embodiments of this disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program containing instructions for performing... Figure 2 The program code for the method. In such an embodiment, the computer program can be downloaded and installed from a network via communication section 509, and / or installed from removable media 511.

[0072] It should be noted that the computer-readable medium shown in this application can be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this application, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media can also be any computer-readable medium other than computer-readable storage media, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable medium can be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.

[0073] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.

[0074] The units or modules described in the embodiments of this application can be implemented in software or hardware. The described units or modules can also be located in a processor. The names of these units or modules do not, in certain circumstances, constitute a limitation on the unit or module itself.

[0075] On the other hand, this application also provides a computer-readable storage medium, which may be included in the computer device described in the above embodiments, or may exist independently and not assembled into the computer device. The aforementioned computer-readable storage medium stores one or more programs that, when used by one or more processors, execute the methods described in this application. For example, it may execute... Figure 2 The steps of the method shown.

[0076] This application provides a computer program product including instructions that, when executed, cause the method described in this application to be performed. For example, it can execute... Figure 2 The steps of the method shown.

[0077] Those skilled in the art will understand that all or part of the processes in the above embodiments can be implemented by a computer program instructing related hardware. The computer program can be stored in a non-volatile computer-readable storage medium. When executed, the computer program can include the processes of the embodiments described above. Any references to memory, databases, or other media used in the embodiments provided in this application can include at least one of non-volatile and volatile memory. Non-volatile memory can include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, high-density embedded non-volatile memory, resistive random access memory (ReRAM), magnetic random access memory (MRAM), ferroelectric random access memory (FRAM), phase change memory (PCM), graphene memory, etc. Volatile memory can include random access memory (RAM) or external cache memory, etc. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM). The databases involved in the embodiments provided in this application may include at least one type of relational database and non-relational database. Non-relational databases may include, but are not limited to, blockchain-based distributed databases. The processors involved in the embodiments provided in this application may be general-purpose processors, central processing units, graphics processing units, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, etc., and are not limited to these.

[0078] The above description is merely a preferred embodiment of this application and an explanation of the technical principles employed. Those skilled in the art should understand that the scope of the invention involved in this application is not limited to technical solutions formed by specific combinations of the above-described technical features, but should also cover other technical solutions formed by arbitrary combinations of the above-described technical features or their equivalents without departing from the inventive concept. For example, technical solutions formed by substituting the above features with (but not limited to) technical features with similar functions disclosed in this application.

Claims

1. A disk processing method, characterized in that, Applied to a distributed storage system, the distributed storage system comprising multiple storage nodes, each storage node deploying multiple disks, the method includes: The health status of each disk is analyzed based on its operating parameters to determine its health status; if the health status indicates that the disk is in a sub-healthy state, the target backup disk for the sub-healthy disk is determined based on the attribute information of the sub-healthy disk. The logical address of the sub-healthy disk is determined based on the address mapping information, a mapping relationship is constructed between the logical address and the physical address of the target backup disk, and the address mapping information is updated based on the mapping relationship; the address mapping information includes multiple mapping relationships between logical addresses and physical addresses of disks; The data stored in the sub-healthy disk is synchronized to the target backup disk, and the IO requests corresponding to the sub-healthy disk are targeted according to the updated address mapping information. After the data synchronization is completed, the sub-healthy disk is removed from the distributed storage system. The targeted processing includes processing by the sub-healthy disk or by the target backup disk.

2. The method according to claim 1, characterized in that, The step of directing IO requests corresponding to the sub-healthy disk based on the updated address mapping information includes: The IO request is targeted based on the data synchronization status, the request type of the IO request, and the updated address mapping information.

3. The method according to claim 2, characterized in that, The step of directing the IO request based on data synchronization status, the request type of the IO request, and the updated address mapping information includes: If the data synchronization status indicates that the data stored in the sub-healthy disk has been synchronized in the backup disk, or if the IO request type is a write request, then the IO request will be directed to the target backup disk for processing according to the mapping relationship in the updated address mapping information. The data synchronization status indicates that the data stored in the sub-healthy disk has not been synchronized in the backup disk, and the IO request type is a read request. In this case, the IO request is directed to the sub-healthy disk for processing according to the updated address mapping information.

4. The method according to claim 3, characterized in that, The method further includes: The data synchronization status indicates that the data stored in the sub-healthy disk has not been synchronized in the backup disk, and the request type of the IO request is a read request. Then, according to the updated address mapping information, the IO request is directed to the sub-healthy disk for processing, and for the remaining data in the sub-healthy disk that has not been synchronized, synchronization begins from the data corresponding to the read request.

5. The method according to claim 1, characterized in that, The step of determining the target backup disk for the sub-healthy disk based on the attribute information of the sub-healthy disk includes: Obtain attribute information of multiple backup disks included in the storage node where the sub-healthy disk is located; the attribute information includes at least one of capacity and media type; The backup disk whose attribute information matches that of the sub-healthy disk among the plurality of backup disks is determined as the target backup disk.

6. The method according to claim 1, characterized in that, The step of analyzing the health status of each disk based on its operating parameters to determine the health status of the disk includes: Obtain the operating parameters of each disk; If the operating parameters are within the pre-set sub-health range, the disk is determined to be in a sub-healthy state and is a sub-healthy disk.

7. A distributed storage system, characterized in that, The distributed storage system includes multiple storage nodes, each storage node deploys multiple disks, and an operating system runs on each storage node. The operating system includes a kernel and a user space, and the kernel includes a virtual bus layer. The user space is used to perform health status analysis on the disks based on the operating parameters of each disk, and to determine the health status of the disks. If the health status indicates that the disk is in a sub-healthy state, a target backup disk for the sub-healthy disk is determined based on the attribute information of the sub-healthy disk; a disk replacement instruction is generated based on the target backup disk and sent to the virtual bus layer; The virtual bus layer is used to respond to the disk replacement command by determining the logical address of the sub-healthy disk according to the address mapping information, constructing a mapping relationship between the logical address and the physical address of the target backup disk, and updating the address mapping information according to the mapping relationship. Synchronize the data stored in the sub-health disk to the target backup disk, and perform targeted processing on the IO requests corresponding to the sub-health disk according to the updated address mapping information; After data synchronization is completed, the sub-healthy disk is removed from the distributed storage system; the address mapping information includes the mapping relationship between multiple logical addresses and the physical address of the disk.

8. The system according to claim 7, characterized in that, The virtual bus layer is also used to obtain the physical address of each disk in the storage node, create a logical address of the physical address, construct address mapping information according to the mapping relationship between the physical address and the logical address, and send the address mapping information to the user space.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, characterized in that, When the processor executes the computer program, it implements the steps of the method according to any one of claims 1 to 6.

10. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the steps of the method according to any one of claims 1 to 6.