Distributed cluster disk fault positioning method, system, device and storage medium

By establishing a set of disk indicator commands in a distributed cluster system, the problem of being unable to locate disk failures was solved, enabling rapid location of faulty disks and improving maintenance efficiency.

CN115202983BActive Publication Date: 2026-06-30JINAN INSPUR DATA TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
JINAN INSPUR DATA TECH CO LTD
Filing Date
2022-07-29
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

In a distributed storage system, the disk failure cannot trigger the automatic activation process, making it impossible for the cluster management software to determine the disk's exact location in the data center, thus affecting maintenance efficiency.

Method used

By pre-collecting the light-up commands of each disk under different scenarios, a correspondence between fault information and light-up commands is established. The fault information of the faulty disk is obtained and the light-up command is sent to make the faulty disk light up.

Benefits of technology

It enables rapid location of faulty disks, improving maintenance efficiency and reducing maintenance time.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN115202983B_ABST
    Figure CN115202983B_ABST
Patent Text Reader

Abstract

This application discloses a method, system, device, and computer-readable storage medium for locating disk faults in a distributed cluster. The method includes: acquiring fault information of a faulty disk in a node; matching the fault information against a pre-defined set of disk indicator lights to find the indicator light corresponding to the faulty disk; and sending the indicator light to the faulty disk to illuminate its fault indicator light. The disk indicator light set includes a correspondence between fault information and disk indicator lights. This application pre-collects disk indicator lights for each disk under various scenarios and establishes a correspondence between fault information and indicator lights to obtain a set of disk indicator lights. Based on the fault information, the corresponding indicator light can be found, and then the indicator light can be sent to the faulty disk to illuminate its fault indicator light. This allows maintenance personnel to quickly locate the actual installation location of the faulty disk based on the indicator light, improving maintenance efficiency.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of distributed storage system technology, and in particular to a method, system, device, and computer-readable storage medium for locating faults in a distributed cluster disk. Background Technology

[0002] In the context of massive data environments, clusters are becoming increasingly larger and more complex. While demanding high scalability and adaptability from the system, the requirements for the management and operation of distributed clusters are also becoming increasingly higher and more important.

[0003] In distributed storage systems, cluster data is typically stored on disks, and disk health is crucial for ensuring the cluster's normal operation. Therefore, we need to monitor the disk status in the cluster in real time. When disk health issues arise, we must promptly report alarms to the cluster management software platform. However, since disk failures do not automatically trigger an alarm activation process, when the cluster management software receives a disk alarm from a node in the cluster, it cannot determine the disk's exact location in the data center based on its name. This causes inconvenience for subsequent disk replacements. Furthermore, the cluster management software cannot update the disk health status because it cannot obtain the disk's alarm status, and cluster maintenance personnel cannot obtain the true operational status of the disks through the management software.

[0004] Therefore, a method is needed that can illuminate a fault indicator light after a disk failure to facilitate quick location of the faulty disk. Summary of the Invention

[0005] In view of this, the purpose of this application is to provide a method, system, device, and computer-readable storage medium for locating disk faults in a distributed cluster, which can illuminate a fault indicator light after a disk failure to facilitate rapid location of the faulty disk and improve maintenance efficiency. The specific solution is as follows:

[0006] A method for locating disk faults in a distributed cluster includes:

[0007] Obtain fault information of the faulty disk in the node;

[0008] The fault information is used to match the preset set of disk light-up commands to find the light-up command corresponding to the faulty disk.

[0009] Send the light-up command to the faulty disk to make the faulty disk light up;

[0010] The disk light-up command set includes the correspondence between fault information and disk light-up commands.

[0011] Optionally, the process of obtaining fault information of the faulty disk in the node includes:

[0012] Obtain the disk operation information in the node;

[0013] Analyze the operational information to determine if the disk is faulty;

[0014] If the disk fails, the disk is identified as the failed disk, and failure information for the failed disk is generated.

[0015] Optionally, the process of analyzing the operational information to determine whether the disk is faulty includes:

[0016] Analyze the operational information to determine whether the disk seek failure rate, disk uncorrectable errors, total disk write volume, NVMe SSD media usage, NVMe SSD media errors, and SSD wear indicators exceed preset failure thresholds.

[0017] Optionally, the process of generating the fault information of the faulty disk includes:

[0018] Obtain the node model and operating system of the node where the faulty disk is located;

[0019] Obtain the disk information of the faulty disk;

[0020] Fault information is generated, including the node model, the operating system, and the disk information.

[0021] Optionally, the process of generating the disk lighting instruction set includes:

[0022] Obtain the disk light-up commands corresponding to different operating systems for each disk model on different node models;

[0023] Establish the correspondence between disk model, node model, operating system and disk lighting command to obtain the disk lighting command set.

[0024] Optional, also includes:

[0025] Obtain the disk log information in the node;

[0026] Based on the log information, query and obtain the light status of the disk;

[0027] The lighting status is displayed to the user terminal for the user to view.

[0028] Optional, also includes:

[0029] Obtain the attribute information of the disk in the node;

[0030] The attribute information is used to match the preset disk light status query instruction set to find the light query instruction corresponding to the disk.

[0031] Send the light-up query command to the disk;

[0032] The system receives the lighting status feedback from the disk and displays the lighting status to the user terminal for the user to view.

[0033] Optionally, the process of receiving the lamp status feedback from the disk and displaying the lamp status to the user terminal includes:

[0034] Receive the LED status feedback from the disk;

[0035] Save the lighting status to the database;

[0036] The system queries the database to record the lighting status and displays it on the user terminal.

[0037] This application also discloses a distributed cluster disk fault location system, including:

[0038] The fault information acquisition module is used to acquire fault information of faulty disks in the node;

[0039] The light-up command query module is used to match the fault information in a preset set of disk light-up commands to find the light-up command corresponding to the faulty disk.

[0040] A light-up command sending module is used to send the light-up command to the faulty disk so that the faulty disk lights up the fault light;

[0041] The disk light-up command set includes the correspondence between fault information and disk light-up commands.

[0042] This application also discloses a distributed cluster disk fault location device, comprising:

[0043] Memory, used to store computer programs;

[0044] A processor is used to execute the computer program to implement the distributed cluster disk fault location method as described above.

[0045] This application also discloses a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the distributed cluster disk fault location method as described above.

[0046] In this application, a method for locating disk faults in a distributed cluster includes: obtaining fault information of a faulty disk in a node; matching the fault information with a preset set of disk lighting instructions to find the lighting instruction corresponding to the faulty disk; and sending the lighting instruction to the faulty disk to make the faulty disk light up its fault indicator; wherein, the set of disk lighting instructions includes the correspondence between fault information and disk lighting instructions.

[0047] This application pre-collects disk light-up commands for each disk in various scenarios and establishes a correspondence between fault information and light-up commands to obtain a set of disk light-up commands. Based on the fault information, the light-up command corresponding to the faulty disk can be found, and then the light-up command can be sent to the faulty disk to make the faulty disk light up. This allows maintenance personnel to quickly locate the actual installation location of the faulty disk based on the fault light, thus improving maintenance efficiency. Attached Figure Description

[0048] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of this application. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0049] Figure 1 This is a schematic flowchart of a distributed cluster disk fault location method disclosed in an embodiment of this application;

[0050] Figure 2 This is a schematic flowchart of another distributed cluster disk fault location method disclosed in an embodiment of this application;

[0051] Figure 3 This is a schematic diagram illustrating the correspondence in a set of disk lighting instructions disclosed in an embodiment of this application;

[0052] Figure 4 This is a schematic flowchart of another distributed cluster disk fault location method disclosed in an embodiment of this application;

[0053] Figure 5 This is a schematic flowchart of another distributed cluster disk fault location method disclosed in an embodiment of this application;

[0054] Figure 6 This is a schematic diagram of the structure of a distributed cluster disk fault location system disclosed in an embodiment of this application;

[0055] Figure 7 This is a schematic diagram of a distributed cluster disk fault location device disclosed in an embodiment of this application. Detailed Implementation

[0056] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0057] This application discloses a method for locating disk faults in a distributed cluster. (See also...) Figure 1 As shown, the method includes:

[0058] S11: Obtain fault information of the faulty disk in the node.

[0059] Specifically, when a disk fails, the real-time or timed monitoring function of the existing management software in the distributed cluster system can be used to detect the disk failure in a timely manner and collect the failure information of the failed disk. The failure information will record the name of the failed disk, the node it is located on, and the disk slot number, etc., for subsequent matching.

[0060] S12: Use the fault information to match the preset set of disk lighting commands and find the lighting command corresponding to the faulty disk.

[0061] Specifically, because different disk models operate in different environments, the required LED activation commands will vary due to differences in interfaces and command calls under different operating systems. This application is advantageous in distributed cluster systems, where nodes may use different operating systems and the disk models used may not be uniform. Therefore, to ensure that LED activation can still be controlled even if any disk in the distributed cluster system fails, it is necessary to pre-collect the LED activation commands for each disk model under different scenarios, i.e., different operating environments. For example, the LED activation commands used by nodes running different operating systems such as Windows Server, Netware, Unix, or Linux will differ.

[0062] Specifically, after obtaining the disk lighting commands in various scenarios, the system combines the fault information of the faulty disks that can be obtained, uses the content recorded in the fault information to locate the application scenario of the disk, uses the fault information as a queryable item, establishes the correspondence between the fault information and the disk lighting commands, and obtains a set of disk lighting commands. Thus, after obtaining the fault information of the faulty disk each time, the system can find the lighting command corresponding to the faulty disk in the set of disk lighting commands.

[0063] S13: Send a light-up command to the faulty disk to illuminate the fault light on the faulty disk.

[0064] Specifically, after receiving the LED light command for the faulty disk, the command can be sent to the faulty disk, causing it to light up its own fault LED. This allows maintenance personnel to directly observe the disk with the lit fault LED in the chassis during inspections or troubleshooting, record the disk's location, and then directly replace the faulty disk. This enables maintenance personnel to quickly locate the actual installation location of the faulty disk, improving maintenance efficiency.

[0065] For example, during routine inspections of the computer room, maintenance personnel can directly observe faulty disks with illuminated fault lights. At this time, maintenance personnel can independently record the exact location of the faulty disk in the computer room based on information such as the slot number provided on the chassis. After the inspection, the faulty disk can be replaced. Alternatively, if maintenance personnel confirm the existence of a faulty disk in the computer room through the existing management system and determine the approximate location based on the existing information, they can go to the area where the faulty disk is likely to appear to search. By observing the fault light of the faulty disk, they can quickly find and locate its position, thereby quickly performing maintenance.

[0066] As can be seen, the embodiments of this application pre-collect disk light-up commands for each disk in various scenarios and establish a correspondence between fault information and light-up commands to obtain a set of disk light-up commands. Thus, the light-up command corresponding to the faulty disk can be found based on the fault information, and then the light-up command can be sent to the faulty disk to make the faulty disk light up. This allows maintenance personnel to quickly locate the actual installation location of the faulty disk based on the fault light, thereby improving maintenance efficiency.

[0067] This application discloses a specific method for locating disk faults in a distributed cluster. Compared to the previous embodiment, this embodiment further explains and optimizes the technical solution. See also... Figure 2 As shown, specifically:

[0068] S21: Obtain the operating information of the disks in the node.

[0069] Specifically, by utilizing the real-time or scheduled monitoring functions of existing management software in the distributed cluster system, the operating information of each disk in each node can be obtained periodically or in real-time. For example, the SMARTctl command (SMART, Self-Monitoring Analysis and Reporting Technology) can be used to obtain fields that reflect the operating status of the disk. By analyzing these fields, operating information such as disk seek failure rate and disk uncorrectable errors can be determined. The operating information can specifically include various indicators such as disk seek failure rate, disk uncorrectable errors, total disk write volume, NVMe SSD media usage, NVMe SSD media errors, and SSD wear. By analyzing whether the above indicators exceed preset thresholds or meet corresponding trigger conditions, it can be determined whether the disk is faulty.

[0070] S22: Analyze the running information to determine if the disk is faulty.

[0071] Specifically, by analyzing various indicators in the operational information, such as disk seek failure rate, uncorrectable disk errors, total disk write volume, NVMe SSD media usage, NVMe SSD media errors, and SSD wear, the system determines whether the disk is faulty based on whether these indicators exceed preset fault thresholds. For example, a single indicator can be used to determine if the disk is faulty. If the disk seek failure rate exceeds 10%, the disk is considered faulty regardless of other indicators. Alternatively, a comprehensive assessment of various indicators can be used. For instance, if the disk seek failure rate exceeds 5%, the NVMe SSD media usage exceeds 90%, and the SSD wear exceeds 20%, it can be foreseen that the disk has been overused and will quickly fail if used further. The disk seek failure rate and uncorrectable errors will increase significantly in the short term. Therefore, a comprehensive assessment of multiple indicators determines that the disk is faulty.

[0072] Understandably, if the disk is not faulty, the disk will continue to be monitored without further action, and the process can return to S21.

[0073] S23: If the disk fails, the disk is identified as a faulty disk, and fault information for the faulty disk is generated.

[0074] Specifically, after determining that the disk is a faulty disk, fault information of the faulty disk can be generated. The fault information needs to include the information of the node and disk that can be matched in the disk lighting instruction set. Therefore, when generating fault information, it is necessary to obtain the node model and operating system of the node where the faulty disk is located, as well as the disk information of the faulty disk. Finally, fault information including node model, operating system and disk information is generated.

[0075] To locate the faulty disk, the disk information includes the node name, the faulty disk name, and the slot number where the faulty disk is located. This helps determine the node where the faulty disk is located. The faulty disk model information is then determined using the faulty disk name and slot number. Furthermore, to determine the operating environment of the node where the faulty disk is located, the disk information also needs to include the node model and operating system. Different operating systems correspond to different disk lighting commands, and different node models from different manufacturers will also affect the disk lighting commands. Therefore, the subsequent disk lighting command set needs to establish a correspondence between disk model, node model, operating system, and disk lighting commands, so that the corresponding lighting command for the faulty disk can be matched within the disk lighting command set using the fault information.

[0076] S24: Use the fault information to match the preset disk lighting command set to find the lighting command corresponding to the faulty disk; wherein, the disk lighting command set includes the correspondence between fault information and disk lighting commands.

[0077] Specifically, the fault information includes the disk model, node model, and operating system. The process of generating the disk lighting instruction set can include obtaining the disk lighting instructions corresponding to different operating systems for each disk model on different node models, and establishing the correspondence between disk model, node model, operating system, and disk lighting instructions to obtain the disk lighting instruction set.

[0078] For example, see Figure 3 As shown, first, the disk models are collected, then the corresponding node models for each disk model are established, then the correspondence between each node model and different operating systems is established, and finally the disk lighting command corresponding to each operating system is established, thus forming a correspondence between disk model, node model, operating system and disk lighting command.

[0079] S25: Send a light-up command to the faulty disk to illuminate the fault light on the faulty disk.

[0080] Specifically, after receiving the LED light command for the faulty disk, the command can be sent to the faulty disk, causing it to light up its own fault LED. This allows maintenance personnel to directly observe the disk with the lit fault LED in the chassis during inspections or troubleshooting, record the disk's location, and then directly replace the faulty disk. This enables maintenance personnel to quickly locate the actual installation location of the faulty disk, improving maintenance efficiency.

[0081] Furthermore, this application also discloses a method for locating disk faults in a distributed cluster, see [link to relevant documentation]. Figure 4 As shown, specifically:

[0082] S31: Obtain fault information of the faulty disk in the node.

[0083] Specifically, when a disk fails, the real-time or timed monitoring function of the existing management software in the distributed cluster system can be used to detect the disk failure in a timely manner and collect the failure information of the failed disk. The failure information will record the name of the failed disk, the node it is located on, and the disk slot number, etc., for subsequent matching.

[0084] S32: Use the fault information to match the preset disk lighting command set to find the lighting command corresponding to the faulty disk; wherein, the disk lighting command set includes the correspondence between fault information and disk lighting commands.

[0085] Specifically, because different disk models operate in different environments, the required LED activation commands will vary due to differences in interfaces and command calls under different operating systems. This application is advantageous in distributed cluster systems, where nodes may use different operating systems and the disk models used may not be uniform. Therefore, to ensure that LED activation can still be controlled even if any disk in the distributed cluster system fails, it is necessary to pre-collect the LED activation commands for each disk model under different scenarios, i.e., different operating environments. For example, the LED activation commands used by nodes running different operating systems such as Windows Server, Netware, Unix, or Linux will differ.

[0086] Specifically, after obtaining the disk lighting commands in various scenarios, the system combines the fault information of the faulty disks that can be obtained, uses the content recorded in the fault information to locate the application scenario of the disk, uses the fault information as a queryable item, establishes the correspondence between the fault information and the disk lighting commands, and obtains a set of disk lighting commands. Thus, after obtaining the fault information of the faulty disk each time, the system can find the lighting command corresponding to the faulty disk in the set of disk lighting commands.

[0087] S33: Send a light-up command to the faulty disk to illuminate the fault light on the faulty disk.

[0088] Specifically, after receiving the LED light command for the faulty disk, the command can be sent to the faulty disk, causing it to light up its own fault LED. This allows maintenance personnel to directly observe the disk with the lit fault LED in the chassis during inspections or troubleshooting, record the disk's location, and then directly replace the faulty disk. This enables maintenance personnel to quickly locate the actual installation location of the faulty disk, improving maintenance efficiency.

[0089] S34: Retrieve log information of the disk in the node.

[0090] Specifically, when maintenance personnel are on duty, they usually spend most of their time monitoring in front of the user terminal. Therefore, if only the disk fault light is lit, it may not be discovered until the maintenance personnel are on patrol, resulting in the maintenance personnel not being able to detect the fault in time. For this reason, after lighting up the disk fault light, the user terminal can also display that the disk fault light is lit.

[0091] Specifically, the disk log information records the instructions received by each disk and the status changes. Therefore, whether the fault indicator is lit or turns off after being lit can be found in the log. Thus, the disk's light status can be obtained through the disk log information. Therefore, the disk log information in the node can be obtained to determine the disk's light status later.

[0092] S35: Based on the log information, query and obtain the disk's LED status.

[0093] Specifically, the disk's LED status can be found by searching for relevant information in the disk's log information. For example, if the fault LED was turned on at a certain time and there is no record of the fault LED turning off in subsequent records, then the disk's LED status is determined to be fault LED on. If no record of fault LED being turned on is found, or if a record of fault LED turning off is found at a later time, then the disk's LED status is considered to be fault LED off.

[0094] S36: Display the lighting status to the user terminal for the user to view.

[0095] Specifically, the acquired lighting status will be displayed on the user's terminal screen for the user to view. This can be done using existing management software, which can display the lighting status on the corresponding page for easy viewing by the user.

[0096] In addition, this application also discloses a method for locating disk faults in a distributed cluster, see [link to relevant documentation]. Figure 5 As shown, specifically:

[0097] S41: Obtain fault information of the faulty disk in the node.

[0098] Specifically, when a disk fails, the real-time or timed monitoring function of the existing management software in the distributed cluster system can be used to detect the disk failure in a timely manner and collect the failure information of the failed disk. The failure information will record the name of the failed disk, the node it is located on, and the disk slot number, etc., for subsequent matching.

[0099] S42: Use the fault information to match the preset disk lighting command set to find the lighting command corresponding to the faulty disk; wherein, the disk lighting command set includes the correspondence between fault information and disk lighting commands.

[0100] Specifically, because different disk models operate in different environments, the required LED activation commands will vary due to differences in interfaces and command calls under different operating systems. This application is advantageous in distributed cluster systems, where nodes may use different operating systems and the disk models used may not be uniform. Therefore, to ensure that LED activation can still be controlled even if any disk in the distributed cluster system fails, it is necessary to pre-collect the LED activation commands for each disk model under different scenarios, i.e., different operating environments. For example, the LED activation commands used by nodes running different operating systems such as Windows Server, Netware, Unix, or Linux will differ.

[0101] Specifically, after obtaining the disk lighting commands in various scenarios, the system combines the fault information of the faulty disks that can be obtained, uses the content recorded in the fault information to locate the application scenario of the disk, uses the fault information as a queryable item, establishes the correspondence between the fault information and the disk lighting commands, and obtains a set of disk lighting commands. Thus, after obtaining the fault information of the faulty disk each time, the system can find the lighting command corresponding to the faulty disk in the set of disk lighting commands.

[0102] S43: Send a light-up command to the faulty disk to illuminate the fault light on the faulty disk.

[0103] Specifically, after receiving the LED light command for the faulty disk, the command can be sent to the faulty disk, causing it to light up its own fault LED. This allows maintenance personnel to directly observe the disk with the lit fault LED in the chassis during inspections or troubleshooting, record the disk's location, and then directly replace the faulty disk. This enables maintenance personnel to quickly locate the actual installation location of the faulty disk, improving maintenance efficiency.

[0104] S44: Get the attribute information of the disk in the node.

[0105] Specifically, unlike the previous embodiments which obtained the disk's LED status through log information, this embodiment can directly use a disk LED status query command to query the current LED status of the disk. To obtain the disk LED status query command, similar to the aforementioned disk LED command, it is necessary to match the corresponding LED query command in a preset disk LED status query command set using the disk's attribute information. The attribute information may include the disk model, the disk node model, and the node's operating system.

[0106] S45: Use attribute information to match in the preset disk lighting status query instruction set to find the lighting query instruction corresponding to the disk.

[0107] Specifically, the disk indicator status query instruction set includes the correspondence between attribute information and indicator query instructions. The attribute information may include the disk model, the disk node model, and the node operating system. The generation process of the disk indicator status query instruction set may include obtaining the indicator status query instructions corresponding to different operating systems on different node models for each disk model, and establishing the correspondence between disk model, node model, operating system, and indicator status query instructions to obtain the disk indicator status query instruction set.

[0108] Specifically, the principles involved in the attribute information and disk light status query instruction set in this embodiment are the same as those involved in the aforementioned fault information and disk light instruction set, and they can be used interchangeably.

[0109] S46: Send the light-up query command to the disk;

[0110] S47: Receive the lamp status feedback from the disk and display the lamp status to the user terminal for the user to view.

[0111] Specifically, after the corresponding light-lighting query command is found, it can be sent to the corresponding disk, and the light-lighting status can be received from the disk. Then, the light-lighting device can be displayed on the user terminal for the user to view. In this case, the light-lighting status can be displayed on the corresponding page through existing management software for easy viewing by the user.

[0112] Furthermore, to facilitate users in viewing the lighting status of each disk at any time, the lighting status of each disk can be persistently stored. To this end, after receiving the lighting status feedback from the disk, the lighting status can be directly saved to the database, thereby achieving persistent storage. Then, the lighting status recorded in the database can be automatically queried and displayed to the user terminal. Alternatively, it can receive user query commands to query the lighting status of each disk in the database and display it.

[0113] Accordingly, this application also discloses a distributed cluster disk fault location system, see [link to relevant documentation]. Figure 6 As shown, the system includes:

[0114] The fault information acquisition module is used to acquire fault information of faulty disks in the node.

[0115] Specifically, when a disk fails, the real-time or timed monitoring function of the existing management software in the distributed cluster system can be used to detect the disk failure in a timely manner and collect the failure information of the failed disk. The failure information will record the name of the failed disk, the node it is located on, and the disk slot number, etc., for subsequent matching.

[0116] The light-up command query module is used to match fault information with a preset set of disk light-up commands to find the light-up command corresponding to the faulty disk; the set of disk light-up commands includes the correspondence between fault information and disk light-up commands.

[0117] Specifically, because different disk models operate in different environments, the required LED activation commands will vary due to differences in interfaces and command calls under different operating systems. This application is advantageous in distributed cluster systems, where nodes may use different operating systems and the disk models used may not be uniform. Therefore, to ensure that LED activation can still be controlled even if any disk in the distributed cluster system fails, it is necessary to pre-collect the LED activation commands for each disk model under different scenarios, i.e., different operating environments. For example, the LED activation commands used by nodes running different operating systems such as Windows Server, Netware, Unix, or Linux will differ.

[0118] Specifically, after obtaining the disk lighting commands in various scenarios, the system combines the fault information of the faulty disks that can be obtained, uses the content recorded in the fault information to locate the application scenario of the disk, uses the fault information as a queryable item, establishes the correspondence between the fault information and the disk lighting commands, and obtains a set of disk lighting commands. Thus, after obtaining the fault information of the faulty disk each time, the system can find the lighting command corresponding to the faulty disk in the set of disk lighting commands.

[0119] The light-up command sending module is used to send light-up commands to the faulty disk so that the faulty disk lights up the fault light.

[0120] Specifically, after receiving the LED light command for the faulty disk, the command can be sent to the faulty disk, causing it to light up its own fault LED. This allows maintenance personnel to directly observe the disk with the lit fault LED in the chassis during inspections or troubleshooting, record the disk's location, and then directly replace the faulty disk. This enables maintenance personnel to quickly locate the actual installation location of the faulty disk, improving maintenance efficiency.

[0121] As can be seen, the embodiments of this application pre-collect disk light-up commands for each disk in various scenarios and establish a correspondence between fault information and light-up commands to obtain a set of disk light-up commands. Thus, the light-up command corresponding to the faulty disk can be found based on the fault information, and then the light-up command can be sent to the faulty disk to make the faulty disk light up. This allows maintenance personnel to quickly locate the actual installation location of the faulty disk based on the fault light, thereby improving maintenance efficiency.

[0122] Specifically, the fault information acquisition module includes: an operation information acquisition unit, a disk status analysis unit, and a fault information generation unit; among which,

[0123] The operation information acquisition unit is used to acquire the operation information of the disks in the node;

[0124] The disk status analysis unit is used to analyze operating information and determine whether the disk is faulty;

[0125] The fault information generation unit is used to identify the disk as a faulty disk and generate fault information for the faulty disk if the disk status analysis unit determines that the disk is faulty.

[0126] Specifically, the disk status analysis unit can be used to analyze operational information and determine whether indicators such as disk seek failure rate, disk uncorrectable errors, total disk write volume, NVMe SSD media usage, NVMe SSD media errors, and SSD wear exceed preset fault thresholds.

[0127] Specifically, the fault information generation unit includes: a node information acquisition subunit, a disk information acquisition subunit, and a fault information generation subunit; wherein,

[0128] The node information acquisition subunit is used to obtain the node model and operating system of the node where the faulty disk is located;

[0129] The disk information acquisition subunit is used to acquire disk information of the faulty disk;

[0130] The fault information generation subunit is used to generate fault information including node model, operating system, and disk information.

[0131] Specifically, it also includes: a light-lighting instruction acquisition module and an instruction set establishment module; among which,

[0132] The light-up command acquisition module is used to acquire the disk light-up commands corresponding to different operating systems used on different node models for each disk model.

[0133] The instruction set establishment module is used to establish the correspondence between disk model, node model, operating system and disk lighting instructions, and obtain the disk lighting instruction set.

[0134] Specifically, it may also include: a log information acquisition module, a light status query module, and a light status display module; among which,

[0135] The log information acquisition module is used to acquire log information from the disks in the node;

[0136] The lamp status query module is used to query and obtain the lamp status of the disk based on log information;

[0137] The first lighting status display module is used to display the lighting status to the user terminal for the user to view.

[0138] Specifically, it may also include: an attribute information acquisition module, a query command matching module, a query command sending module, and a second light status display module; among which,

[0139] The attribute information acquisition module is used to acquire the attribute information of the disks in the node;

[0140] The query instruction matching module is used to match attribute information in a preset set of disk lighting status query instructions to find the lighting query instruction corresponding to the disk.

[0141] The query command sending module is used to send light-light query commands to the disk;

[0142] The second lamp status display module is used to receive the lamp status feedback from the disk and display the lamp status to the user terminal for the user to view.

[0143] Specifically, the second lighting status display module may include: a lighting status receiving unit, a lighting status saving unit, and a lighting status display subunit; wherein,

[0144] The lighting status receiving unit is used to receive the lighting status feedback from the disk.

[0145] The lighting status storage unit is used to save the lighting status to the database;

[0146] The lighting status display sub-unit is used to query the lighting status recorded in the database and display it to the user terminal.

[0147] Figure 7 A structural diagram of a distributed cluster disk fault location device provided in an embodiment of this application is shown below. Figure 7 As shown, in this embodiment, the distributed cluster disk fault location device can be specifically an electronic device, which may include: a memory 20 for storing computer programs;

[0148] The processor 21 is used to implement the SDS status detection method in the cloud management platform as described above when executing computer programs.

[0149] The electronic devices provided in this embodiment may include, but are not limited to, smartphones, tablets, laptops, or desktop computers.

[0150] The processor 21 may include one or more processing cores, such as a quad-core processor or an octa-core processor. The processor 21 may be implemented using at least one hardware form selected from DSP (Digital Signal Processing), FPGA (Field-Programmable Gate Array), and PLA (Programmable Logic Array). The processor 21 may also include a main processor and a coprocessor. The main processor, also known as a CPU (Central Processing Unit), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, the processor 21 may integrate a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, the processor 21 may also include an AI (Artificial Intelligence) processor, which is used to handle computational operations related to machine learning.

[0151] The memory 20 may include one or more computer-readable storage media, which may be non-transitory. The memory 20 may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In this embodiment, the memory 20 is used to store at least the following computer program 201, which, after being loaded and executed by the processor 21, is capable of implementing the relevant steps of the text answer determination method disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory 20 may also include an operating system 202 and data 203, and the storage method may be temporary or permanent storage. The operating system 202 may include Windows, Unix, Linux, etc. The data 203 may include, but is not limited to, set offsets.

[0152] In some embodiments, the electronic device may further include a display screen 22, an input / output interface 23, a communication interface 24, a power supply 25, and a communication bus 26.

[0153] Those skilled in the art will understand that Figure 6 The structures shown do not constitute a limitation on electronic devices and may include more or fewer components than those shown.

[0154] It is understood that if the method for determining the text answer in the above embodiments is implemented as a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and executes all or part of the steps of the methods in the various embodiments of this application. The aforementioned storage medium includes: USB flash drive, mobile hard disk, read-only memory (ROM), random access memory (RAM), electrically erasable programmable ROM, register, hard disk, removable disk, CD-ROM, magnetic disk, or optical disk, and other media capable of storing program code.

[0155] Based on this, this application also discloses a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the SDS status detection method in the cloud management platform as described above.

[0156] The foregoing has provided a detailed description of a distributed cluster disk fault location method, system, apparatus, and computer-readable storage medium provided in the embodiments of this application. The various embodiments in the specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. For the apparatus disclosed in the embodiments, since it corresponds to the method disclosed in the embodiments, the description is relatively simple; relevant parts can be referred to in the method section.

[0157] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0158] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0159] The technical content provided in this application has been described in detail above. Specific examples have been used to illustrate the principles and implementation methods of this application. The description of the above embodiments is only for the purpose of helping to understand the method and core ideas of this application. At the same time, for those skilled in the art, there will be changes in the specific implementation methods and application scope based on the ideas of this application. Therefore, the content of this specification should not be construed as a limitation of this application.

Claims

1. A distributed cluster disk fault locating method, characterized in that, include: Obtain fault information of the faulty disk in the node; The fault information is used to match the preset set of disk light-up commands to find the light-up command corresponding to the faulty disk. Send the light-up command to the faulty disk to make the faulty disk light up; The disk light-up command set includes the correspondence between fault information and disk light-up commands; The process of obtaining fault information from the faulty disk includes: Obtain the node model and operating system of the node where the faulty disk is located; Obtain the disk information of the faulty disk; Generate fault information including the node model, operating system, and disk information; The generation process of the disk lighting instruction set includes: Obtain the disk light-up commands corresponding to different operating systems for each disk model on different node models; Establish the correspondence between disk model, node model, operating system and disk lighting command to obtain the disk lighting command set.

2. The method of claim 1, wherein, The process of obtaining fault information of the faulty disk in the node includes: Obtain the disk operation information in the node; Analyze the operational information to determine if the disk is faulty; If the disk fails, the disk is identified as the failed disk, and failure information for the failed disk is generated.

3. The method of claim 2, wherein, The process of analyzing the operational information and determining whether the disk is faulty includes: Analyze the operational information to determine whether the disk seek failure rate, disk uncorrectable errors, total disk write volume, NVMe SSD media usage, NVMe SSD media errors, and SSD wear indicators exceed preset failure thresholds.

4. The method of claim 1, wherein, Also includes: Obtain the disk log information in the node; Based on the log information, query and obtain the light status of the disk; The lighting status is displayed to the user terminal for the user to view.

5. The method of claim 1, wherein, Also includes: Obtain the attribute information of the disk in the node; The attribute information is used to match the preset disk light status query instruction set to find the light query instruction corresponding to the disk. Send the light-up query command to the disk; The system receives the lighting status feedback from the disk and displays the lighting status to the user terminal for the user to view.

6. The method of claim 5, wherein, The process of receiving the LED status feedback from the disk and displaying the LED status to the user terminal includes: Receive the LED status feedback from the disk; Save the lighting status to the database; The system queries the database to record the lighting status and displays it on the user terminal.

7. A distributed cluster disk fault localization system, comprising: include: The fault information acquisition module is used to acquire fault information of faulty disks in the node; The light-up command query module is used to match the fault information in a preset set of disk light-up commands to find the light-up command corresponding to the faulty disk. A light-up command sending module is used to send the light-up command to the faulty disk so that the faulty disk lights up the fault light; The disk light-up command set includes the correspondence between fault information and disk light-up commands; The fault information acquisition module includes: The system includes a node information acquisition subunit, a disk information acquisition subunit, and a fault information generation subunit. The node information acquisition subunit is used to obtain the node model and operating system of the node where the faulty disk is located; The disk information acquisition subunit is used to acquire disk information of the faulty disk; The fault information generation subunit is used to generate fault information including node model, operating system, and disk information; It also includes: a light-lighting instruction acquisition module and an instruction set establishment module; The light-up command acquisition module is used to acquire the disk light-up commands corresponding to different operating systems used on different node models for each disk model. The instruction set establishment module is used to establish the correspondence between disk model, node model, operating system and disk lighting instructions, and obtain the disk lighting instruction set.

8. A distributed cluster disk fault locating apparatus, characterized by, include: Memory, used to store computer programs; A processor for executing the computer program to implement the distributed cluster disk fault location method as described in any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the distributed cluster disk fault location method as described in any one of claims 1 to 6.