A server failure early warning method and computing device
By generating risk alerts through the rectification and early warning database, the problem of delayed rectification of server failures has been solved, timely early warning of potential risks has been achieved, human resource waste has been reduced, and the rectification cycle has been shortened.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HENAN KUNLUN TECH CO LTD
- Filing Date
- 2023-10-18
- Publication Date
- 2026-06-16
Smart Images

Figure CN117539729B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a server fault early warning method and computing device. Background Technology
[0002] As business demands on server clusters increase, the number of server hardware and software components also grows to meet these demands. Among these numerous servers, a hardware failure can lead to a decline in overall server performance, the issuance of error messages, and in severe cases, server downtime, significantly impacting business operations and availability. Therefore, maintaining server hardware is an indispensable and crucial measure.
[0003] In current technology, when a server malfunctions during use or production, corrective measures are taken to address the issue, such as firmware upgrades or configuration modifications, thereby resolving the problem. Furthermore, a rectification notice is issued based on these measures. Other servers may also face the potential risk of experiencing the same malfunction, but they typically can only wait for the malfunction to occur before determining the corresponding previously issued rectification notice. Therefore, the purchaser of the affected server then contacts maintenance personnel to resolve the issue based on the rectification notice.
[0004] Therefore, in current technology, after rectifying a fault in a specific server and issuing a rectification notice, if other servers with potential risks of the same fault are to be rectified, the only recourse is to passively respond after the fault occurs, and then have the customer contact maintenance personnel to investigate and rectify the fault. This passive approach to fault response and rectification is somewhat delayed and wastes a significant amount of human resources in the investigation and the entire process. Summary of the Invention
[0005] This application provides a server fault early warning method and computing device, which can solve the lag in passive fault response, prevent potential risks of server problems in advance, and provide timely early warning of possible server problems.
[0006] To achieve the above objectives, this application adopts the following technical solution:
[0007] Firstly, this application provides a server fault early warning method applied to a first server. The method includes: determining a target server corresponding to the first server based on preset rules; obtaining configuration information of the target server; determining, based on the target server's configuration information and using a rectification early warning database, whether a server requiring rectification exists within the target server; the rectification early warning database includes rectification issues and server configuration information corresponding to those issues; generating a risk alarm when a server requiring rectification is determined to exist within the target server; and sending the risk alarm to the target server to alert the user of potential risks associated with the server requiring rectification. By determining the server requiring rectification based on the rectification early warning database and generating and sending a risk alarm to alert the user of potential risks, this method eliminates the need to wait for a server failure before responding passively. It addresses the lag in fault rectification in current technologies, providing timely early warnings of potential server problems to prevent potential risks in advance.
[0008] In one possible implementation, based on the rectification warning database, it is determined whether the target server's configuration information exists in the database. If the target server's configuration information exists in the database, it is determined that a server to be rectified exists within the target server group; if the target server's configuration information does not exist in the database, it is determined that no server to be rectified exists within the target server group. By using the rectification issues and corresponding configuration information stored in the rectification warning database, the server to be rectified can be identified, simplifying the process and shortening the time required for identification. This improves the timeliness of warnings regarding potential risks of server rectification issues.
[0009] In one possible implementation, a risk alarm is generated based on the device identifier of the server to be rectified; this risk alarm carries the device identifier of the server to be rectified. The risk alarm explicitly carries the device identifier of the server to be rectified, so that subsequent users can clearly identify information about servers with potential rectification issues.
[0010] In one possible implementation, risk alerts are sent to all target servers. Based on the device identifier of the server to be rectified carried in the risk alert, the user is alerted that the server to be rectified has potential risks. By sending risk alerts to all target servers, users can clearly identify servers with potential rectification issues based on the device identifier. At the same time, the risk alert also serves as a notification to other servers, meaning that users can proactively conduct a secondary screening for fault risks based on the device identifier of the server to be rectified.
[0011] In one possible implementation, the risk alert is sent to the server to be rectified based on the device identifier carried in the risk alert, thus alerting the user to the potential risks associated with the server. This provides accurate notification of the potential risks of rectification issues on the server without requiring further confirmation from the user.
[0012] In one possible implementation, the rectification issues corresponding to the servers to be rectified are identified based on a rectification warning database. A risk alarm is generated based on the device identifier of the server to be rectified and the rectification issues involved. The risk alarm carries the device identifier of the server to be rectified and the related rectification issues. While alerting the user to the potential risk of rectification issues arising on the server to be rectified, the system also clearly provides detailed information about the possible rectification issues.
[0013] In one possible implementation, the rectification early warning database further includes: rectification announcement information for rectification issues; determining the rectification issues corresponding to the servers to be rectified based on the rectification early warning database; determining the rectification announcement information for the rectification issues corresponding to the servers to be rectified based on the rectification early warning database; generating a risk alarm based on the device identifier of the server to be rectified, the corresponding rectification issue, and the rectification announcement information; the risk alarm carries the device identifier of the server to be rectified, the corresponding rectification issue, and the rectification announcement information. While alerting users to the potential risks of rectification issues occurring on the servers to be rectified and providing detailed information about the possible rectification issues, users can proactively respond to the rectification announcement information, obtain the rectification announcement issued for the rectification issue, and perform pre-emptive rectification of the target server according to the rectification announcement, thereby avoiding the potential risks of rectification issues. Furthermore, the problem description and problem localization are completed after the rectification announcement is generated, reducing the redundant waste of human resources, shortening the rectification cycle, and avoiding prolonged impact on business operations and availability.
[0014] In one possible implementation, the rectification early warning database further includes: server rectification measures corresponding to rectification issues; determining rectification issues corresponding to servers to be rectified based on the rectification early warning database; obtaining server rectification measures corresponding to rectification issues based on the rectification early warning database; and rectifying the servers to be rectified using the corresponding server rectification measures. Rectifying the servers to be rectified first effectively extends the time before rectification issues may occur on the servers to be rectified, providing users with sufficient time for server upgrades / updates.
[0015] In one possible implementation, the preset rule is that when the target server and the first server are in the same network segment and belong to the same vendor as the first server, one or more second servers in the same network segment as the first server are determined based on the ping program; the vendor identifier of one or more second servers is obtained through the PIMI interface; based on the vendor identifier of one or more second servers, the target server from the one or more second servers that belongs to the same vendor as the first server is determined. This method can quickly identify the target server that is in the same network segment as the first server and belongs to the same vendor as the first server.
[0016] Secondly, this application provides a server fault early warning device, comprising: a configuration information acquisition module for acquiring configuration information of a target server; a server judgment module for determining, based on the configuration information of the target server and using a rectification early warning database, whether there is a server to be rectified among the target servers; the rectification early warning database includes rectification issues and server configuration information corresponding to the rectification issues; a risk alarm generation module for generating a risk alarm when it is determined that there is a server to be rectified among the target servers; and a risk alarm sending module for sending the risk alarm to the target server to alert the user that the server to be rectified has potential risks. This eliminates the need to wait for a server fault to occur before responding passively, solving the lag in fault rectification in current technologies, and providing timely early warnings of potential server problems to prevent potential risks from server issues.
[0017] Thirdly, this application provides a computing device including a processor and a memory communicatively connected to the processor; the memory is used to store computer execution instructions; the processor is used to execute the computer execution instructions stored in the memory, causing the processor to perform the method described in the first aspect.
[0018] Fourthly, this application provides a computer-readable storage medium storing a computer program or instructions that, when executed, implement the method described in the first aspect.
[0019] Fifthly, this application provides a computer program product, including a computer program or instructions that, when executed by a processor, implement the method described in the first aspect. Attached Figure Description
[0020] Figure 1 This is a schematic diagram of the structure of a computing device provided in an embodiment of this application;
[0021] Figure 2 A flowchart illustrating a server fault early warning method provided in an embodiment of this application;
[0022] Figure 3 A schematic diagram illustrating a server fault early warning method provided in an embodiment of this application;
[0023] Figure 4 A schematic diagram illustrating a scenario for yet another server fault early warning method provided in this application embodiment;
[0024] Figure 5 A schematic diagram illustrating a scenario for yet another server fault early warning method provided in this application embodiment;
[0025] Figure 6 A schematic diagram illustrating a scenario for yet another server fault early warning method provided in this application embodiment;
[0026] Figure 7 A flowchart illustrating another server fault early warning method provided in this application embodiment;
[0027] Figure 8 A flowchart illustrating another server fault early warning method provided in this application embodiment;
[0028] Figure 9 This is a schematic diagram of a server fault early warning device provided in an embodiment of this application. Detailed Implementation
[0029] The terms "first," "second," and "third," etc., used in this application specification, claims, and drawings are used to distinguish different objects, not to limit a specific order.
[0030] In the embodiments of this application, the terms "exemplary" or "for example" are used to indicate that something is an example, illustration, or description. Any embodiment or design that is described as "exemplary" or "for example" in the embodiments of this application should not be construed as being more preferred or advantageous than other embodiments or design. Specifically, the use of the terms "exemplary" or "for example" is intended to present the relevant concepts in a specific manner.
[0031] To ensure clarity and conciseness in the description of the following embodiments, a brief introduction to the related technologies is given first:
[0032] The Intelligent Platform Management Interface (IPMI) is a standard used in the design of server management systems. Designing with this interface standard helps to implement system management on different types of server system hardware, making centralized management of different platforms possible.
[0033] Redfish is an open industry standard specification released by the Distributed Management Task Force (DMTF) to modernize and securely manage platform hardware. It is a management standard that can represent various implementations through a consistent interface.
[0034] The advantages of the server fault early warning method provided in this application are explained below in comparison with the problems existing in the current technology.
[0035] In current technology, when a server experiences a failure (problem) during use or production, the server is rectified (e.g., firmware version upgrade, firmware configuration modification, etc.) to obtain an updated server version that resolves the issues present in the original version. A rectification announcement is also released to address the problem. For ease of understanding, the following example illustrates this:
[0036] The server encountered an issue during use or production where an excessively large intelligent diagnostic database caused repeated IBMC resets. Therefore, a problem description is needed (including: involved hardware configuration, problem symptoms, etc.) for this issue. After identifying the root cause of the problem, server rectification should be implemented, and a rectification announcement should be issued. The rectification announcement may include: actual hardware configuration, scope of applications involved, estimated completion time, manpower investment, modification history, contact information of maintenance personnel, problem keywords, problem summary, and problem description. The problem description may include: triggering conditions and problem symptoms.
[0037] However, for other server devices with potential risks, the current remediation mechanism passively responds after a remediation notice has been issued for the potentially risky server device. The customer then contacts maintenance personnel based on the notice to investigate and rectify the problem. This passive approach to server issues is inherently delayed, failing to promptly identify potentially risky servers and prevent future problems. Furthermore, this remediation method necessitates manual investigation and rectification of the same fault on different server nodes, wasting significant human resources in the process.
[0038] Troubleshooting server issues takes a considerable amount of time. This is because resolving server problems involves multiple steps, including at least: the customer contacting maintenance personnel, the maintenance personnel locating and rectifying the problem on the server node, and the server being put back into use after rectification. As a result, the server cannot be used for an extended period of time, which seriously affects the operation and availability of business.
[0039] This application provides a server fault early warning method applied to a latest version of a first server. The first server obtains configuration information of a target server located on the same network segment and belonging to the same vendor. The first server compares the obtained target processor configuration information with configuration information in a pre-created rectification early warning database to determine if a server needs rectification. If a server needs rectification, a risk alarm is generated and sent to the target server. The rectification early warning database determines whether the target server has potential risks. If a potential risk exists, the first server proactively sends a risk alarm to alert the user that the target server may have rectification issues. This solves the problem of delayed reactive fault responses, providing timely early warnings of potential server problems to prevent future issues.
[0040] To facilitate understanding of the technical solution of this application, a computing device provided in the embodiments of this application will be introduced first below.
[0041] For example, Figure 1 The diagram shown is a structural schematic of a computing device according to an embodiment of this application. It is understood that this computing device (also referred to herein as a "server") can be a personal computer, a physical server, a cloud server, a workbench, a hyperterminal, etc., but is not limited thereto. Figure 1 As shown, the hardware of the computing device 10 includes a processor 110, memory 120, and management controller 130; the software of the computing device 10 mainly includes an operating system (OS) 140 and processor firmware 150.
[0042] The processor 110 may include various processing devices, such as a central processing unit (CPU), a system-on-a-chip (SoC), a processor integrated on an SoC, a separate processor chip, or a controller. The processor 110 may also include special-purpose processing devices, such as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and digital signal processors (DSPs). The processor 110 may be a processor group consisting of multiple processors, which are coupled to each other via one or more buses.
[0043] The memory 120, also known as internal memory or main memory, can be coupled to the processor 110. Specifically, the memory 120 can be coupled to the processor 110 through one or more memory controllers. There can be one or more memory modules 120, which can be volatile memory, such as random access memory (RAM), or other types of dynamic storage devices that can store information and instructions, to serve as the running memory of the computing device 120.
[0044] The management controller 130 is used to remotely maintain and manage the computing device 10 through a dedicated data channel. The management controller 130 is completely independent of the operating system of the computing device 10 and can communicate with the processor, memory, etc. through its out-of-band management interface.
[0045] For example, the management controller 130 may include a management unit for the operating status of the computing device, a management system in a management chip outside the processor, an out-of-band management controller (baseboard management controller, BMC), etc. It should be noted that the specific form of the management controller is not limited in the embodiments of this application; the above are merely illustrative examples. In the following embodiments, an out-of-band management controller (BMC) is used as an example for explanation.
[0046] Among them, the operating system 140 is the basic system program installed in the computing device 10, including but not limited to iOS, Android, Windows, Harmony OS or other operating systems.
[0047] The processor firmware 150, also known as firmware, is essentially a program written into EPROM (Erasable Programmable Read-Only Memory) or EEPROM (Electrically Erasable Programmable Read-Only Memory). Processor firmware 150 refers to the device drivers stored internally in computing device 10. Through firmware, the operating system can implement specific machine operations according to standard device drivers. Examples include firmware for the Basic Input / Output System (BIOS), Management Engine (ME), microcode, or Intelligent Management Unit (IMU). It should be noted that the specific form of processor firmware 150 in this embodiment is not limited; the above is merely illustrative.
[0048] It should be noted that processor firmware 150 can be located within processor 110 (e.g., Figure 1 (as shown), or the processor firmware 150 can also be located in a firmware chip outside the processor 110 (as shown). Figure 1 (Not shown in the text)
[0049] It should be pointed out that, Figure 1 The structure shown does not constitute a limitation on the computing device 10, except... Figure 1 In addition to the components shown, the computing device 10 may include more or fewer components than illustrated, or combine certain components, or have different component arrangements. For example, the computing device 10 also includes a display screen for displaying images, videos, etc. The display screen includes a display panel. The display panel may be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a minimized display, a microLED, a micro-OLED, a quantum dot light-emitting diode (QLED), etc. In some embodiments, the computing device 10 may include one or N displays, where N is a positive integer greater than 1. In the embodiments of this application, the display screen may display risk warnings as described in the embodiments of this application to alert the user to potential risks associated with the computing device 10.
[0050] Example 1:
[0051] The following is combined with Figures 2-7 This application provides a detailed description of a server fault early warning method based on a practical example.
[0052] The application scenario of this method is as follows: Suppose a server (hereinafter referred to as server A) malfunctions during use or production. After maintenance personnel locate the root cause of the problem, they rectify server A (e.g., upgrade the processor firmware version, modify the processor firmware configuration) to obtain the latest version of server A to resolve the malfunction. Based on the latest version of server A, the server fault early warning method provided in this application embodiment is implemented. For ease of explanation, the latest version of the server (i.e., server A) is simply referred to as the first server.
[0053] In one possible implementation, the first server can be determined based on its version number. A version number is an identifier for a specific version of a device, software program, file, firmware, device driver, or even hardware; it is a unique number or set of labels. As a new version is released, the version number increases. In this scheme, as the server is updated or modified to a new version, the version number of the updated or modified server increases compared to the version number of the unupdated / unmodified server. Therefore, based on the server's version number, the server with the highest version number is determined as the first server.
[0054] It should be noted that the first server can be the latest version of the server in the server cluster, or any server in the server cluster; this application does not impose specific limitations. However, applying the latest version of the server in the server cluster allows for more accurate early warning of potential risks associated with faults in other servers. This is because if the server fault warning method provided in this application embodiment is not implemented using the latest version of the server, the rectification and warning database may not be up-to-date. That is, faults already occurring in the server cluster may not have been updated to the rectification and warning database in a timely manner, preventing timely risk warnings for already occurring faults. Furthermore, assuming the rectification and warning database is up-to-date, meaning faults occurring in the server cluster have been updated to the rectification and warning database in a timely manner, implementing the server fault warning method provided in this application embodiment based on the latest version of the server can more comprehensively locate servers in the server cluster with fault risks. If the server fault warning method provided in this application embodiment is not implemented based on the latest version of the server, there is a risk of ignoring faults in servers implementing the method, and servers with fault risks in the server cluster cannot be comprehensively located.
[0055] S201. Identify one or more second servers that are located in the same network segment as the first server.
[0056] The second server is a server located in the same network segment as the first server. It should be noted that the number of second servers can be one or more; this application does not impose a specific limitation.
[0057] When the second server and the first server are on the same network segment, then the second server and the first server must be within the same local area network (LAN). Servers within the same LAN are essentially multiple servers communicating through the same switch, and all servers connected to that same switch are within the same LAN.
[0058] When the second server is on the same network segment as the first server, the output transmission between the first server and the second server can be achieved without the intervention of a router or a Layer 3 switch. The data transmission between the first server and the second server can also be called intranet communication.
[0059] In one possible implementation, the first server determines the second server, which is located in the same network segment as the first server, based on an Internet packet explorer (ping).
[0060] The ping (packet internet groper) program is a basic tool for testing the connectivity between two servers. The ping program is mainly used to test whether devices can send and receive data normally, thereby determining whether the devices are operating normally and whether the network is smooth.
[0061] Specifically, the first server uses the ping program to ping the IP address of its own network segment (i.e., the first server pings the second server on the same network segment), which sends an ARP (Address Resolution Protocol) packet. The switch connected to the first server receives this ARP packet and forwards it to all ports. Other servers receiving this packet can determine whether they are the server being searched for based on the network segment IP address. If not, they discard the ARP packet without responding (in addition, routers can directly isolate broadcast domains; servers communicating through routers are on different LANs and will generally discard the ARP packet). If they are, they immediately respond with an ARP response packet to the first server; this server is the second server.
[0062] To better understand the relationship between the second server and the first server, the following will combine... Figure 3 Let's illustrate this with an example. We'll use server A as the first server as an example.
[0063] This includes server A, server B, server C, server D, server E, a switch, and a router. Servers A, B, C, and D are directly connected to the switch, while server E is connected to the router and, in turn, to the switch. Server A sends ARP packets by pinging its own network segment IP address. The switch receives these ARP packets and forwards them to servers B, C, D, and the router. Servers B, C, and D receive these forwarded ARP packets and respond to them with ARP response packets. ARP packets forwarded by the switch to the router are discarded by the router; therefore, server E does not receive ARP packets from server A. Thus, the second set of servers corresponding to server A includes servers B, C, and D.
[0064] S202. Identify a target server from one or more second servers that belongs to the same vendor as the first server.
[0065] The target server is one of the first servers that belongs to the same vendor as the first server.
[0066] Specifically, the first server obtains the vendor identifier (e.g., vendor IP address, vendor name, etc.) of one or more second servers via the IPMI interface. The first server then determines whether the second server and the first server belong to the same vendor based on the obtained vendor identifier. If the vendor identifier of the second server is the same as that of the first server, then the second server is the target server; otherwise, the second server is not the target server. The IPMI interface supports remote monitoring and does not require permission from the server's operating system.
[0067] This application provides a server fault early warning method that identifies one or more second servers from the same manufacturer as the first server, aiming to eliminate servers from the same manufacturer as the first server. Servers from different manufacturers have different internal hardware configurations, firmware configurations, and hardware connections, leading to incompatibility in terms of potential problems, problem location, and causes. Because these incompatibility exist, it's impossible to determine the potential risks of second servers based on the historical problems of the first server, their location, and causes. Therefore, this method pre-determines whether one or more second servers belong to the same manufacturer as the first server, eliminating servers from different manufacturers among the one or more second servers.
[0068] Before performing S202, the first server needs to be authorized to obtain information. Only after the first server is authorized to obtain information can the IPMI interface be sent to the second server to obtain the company identifier of the second server.
[0069] For example, a pop-up window can be displayed to the user through the first server, which includes an option to allow the first server to obtain information from the second server. In response to the user's operation, the first server is authorized to obtain information from the second server (e.g., the user clicks the option to allow the first server to obtain information from the second server using a peripheral device).
[0070] It should be noted that the authorization to obtain information from the first server can be granted permanently in advance, that is, after the first server is authorized to obtain information, the authorization status of the first server to obtain information from the second server is maintained. Alternatively, authorization can be re-granted before the first server obtains information from the second server, which can also be called temporary authorization. This application does not make specific limitations.
[0071] S203. Obtain the configuration information of the target server.
[0072] Specifically, the first server obtains the target server's configuration information through the IPMI interface.
[0073] The target server's configuration information includes: the target server's model and firmware version (e.g., BMC, BIOS, CPLD, etc.).
[0074] The purpose of the first server obtaining the configuration information of the target server is to compare / configure the target server's configuration information with the server configuration information stored in the rectification and early warning database, so as to determine whether there is a potential risk of problems with the target server.
[0075] S204. Based on the configuration information of the target server obtained, compare it with the rectification warning database to determine whether there is a target server involved in the rectification issue.
[0076] The rectification early warning database is pre-built based on historical issues (i.e., rectification issues) that occurred with the first server and its corresponding target server, the server configurations involved in those issues, and rectification announcements for those issues. Specifically, the rectification early warning database stores at least historical rectification issues for servers within the same network segment and manufactured by the same vendor, along with the corresponding server configuration information for those historical rectification issues.
[0077] In one possible implementation, the rectification early warning database also includes: rectification announcement information corresponding to historical rectification issues, which may include the rectification announcement address, rectification announcement number, etc.
[0078] The rectification notices corresponding to the rectification issues are: rectification notices issued by the pre-maintenance personnel for the rectification issues. The rectification notice information is used to indicate the rectification measures for the rectification issues. For example, the complete rectification notice can be retrieved based on the rectification notice address.
[0079] It should be noted that the rectification early warning database can be stored separately on servers located in the same network segment and manufactured by the same vendor, or it can be stored on the latest version of a server located in the same network segment and manufactured by the same vendor. This application does not impose any specific limitations.
[0080] Among them, the target server involved in the rectification issue can also be called the server to be rectified.
[0081] To facilitate understanding, the rectification warning database is illustrated below with reference to Table 1. The rectification issues that occurred on the first server are illustrated using the example of the IBMC repeatedly resetting due to an excessively large intelligent diagnostic database and the BMC malfunction caused by a Hynix chip failure. Server configuration is illustrated using the server model as an example. In one possible implementation, before obtaining the rectification warning database including the two issues mentioned above, the first server has already experienced these two issues. Maintenance personnel are contacted to describe the problems and pinpoint the root causes, obtaining the titles of the issues, the symptoms, and the server configurations involved. The first server is then rectified to resolve the two issues, and a rectification announcement is issued. The rectification warning database is then formed based on the issues, the server configurations involved, and the issued rectification announcement information for the first server.
[0082] Table 1
[0083]
[0084] As shown in Table 1, for servers with model numbers 1288H V5, 2288V5, and 2488V5, there is a potential risk of repeated IBMC resets due to an excessively large intelligent diagnostic database. Therefore, servers with model numbers 1288HV5, 2288V5, and 2488V5 are subject to rectification. For servers with model numbers 2298V5 and 5288V5, there is a potential risk of BMC malfunctions due to Hynix chip failures. Therefore, servers with model numbers 2298V5 and 5288V5 are subject to rectification. If a server model is not among the affected server models in the rectification warning database, then the server is determined not to be subject to rectification.
[0085] In one possible implementation, the rectification early warning database also includes the risk level corresponding to the rectification issue. The risk level is pre-classified into different levels based on factors such as the degree of impact of the rectification issue on the server and the difficulty of resolving the issue. For ease of understanding, let's take three risk levels as an example: Level 1 risk level indicates a significant impact on the server, such as potentially causing a server cluster, including that server, to crash; Level 2 risk level indicates a moderate impact, potentially causing a single server to crash; and Level 3 risk level indicates a low impact, potentially causing some functions within the server to malfunction.
[0086] If it is determined that there is no target server involved in the rectification issue, proceed to S205.
[0087] If a target server is identified as having issues requiring rectification, proceed to S206.
[0088] S205. Record events on the target server that do not involve rectification issues to the system event log.
[0089] The system event log (SEL log) is used to record the running status and event status of various components in the server.
[0090] S206. Generate risk alerts based on the target servers involved in the rectification issues.
[0091] Based on the rectification early warning database, target servers involved in rectification issues are identified; risk alerts are generated based on the identification information of the target servers involved in rectification issues. The identification information of the target servers can be their IP address, unique identifier, or any other identifying information that can indicate the target server; this application does not impose specific limitations.
[0092] In one possible implementation, the rectification warning database includes, in addition to the rectification issues and the configuration information of the servers involved, rectification announcement information corresponding to the rectification issues. Based on the rectification warning database, the rectification issues corresponding to the target servers involved in the rectification issues and the corresponding rectification announcement information (e.g., the rectification announcement address) can be determined; then, a risk alert is generated based on the target server identification information, the involved rectification issues, and the corresponding rectification announcement information.
[0093] In one possible implementation, the risk alert may also include the configuration information of the target server involved in the rectification issue (e.g., the model of the target server), clearly specifying the model of the server involved in the rectification issue in the risk alert.
[0094] In one possible implementation, the risk alert may also include the risk level of the rectification issues involved in the target server, clearly indicating the extent of the impact of the rectification issues on the server.
[0095] S207. Send the risk alert to the target server.
[0096] In one possible implementation, risk alerts are sent to all target servers to notify users of potential risks associated with those servers. Upon receiving the alert, users can determine whether a target server is involved in rectification issues (i.e., whether a potential risk exists) based on the target server identifier information included in the alert. This proactively informs both the target server and the user of potential risks, eliminating the need to wait for a server failure before responding passively. This addresses the lag in current technologies for fault rectification, providing timely warnings of potential server problems and preventing potential server issues in advance.
[0097] In one possible implementation, risk alerts can be sent to the target server involved in the rectification issue based on its IP address. This alerts the user to potential risks associated with the target server, enabling early prevention of potential server problems and providing accurate warnings to the target server involved in the rectification issue.
[0098] In one possible implementation, the risk alert carries the target server identifier involved in the rectification issue, the rectification issue itself, and rectification announcement information for that issue. Sending the risk alert to the target server alerts the user to the potential risk. Because the alert carries the rectification issue details, it not only alerts the user to the potential risk but also clearly indicates the possible rectification problem. Furthermore, the alert includes the corresponding rectification announcement information, allowing the user to proactively respond to the announcement, obtain the relevant rectification notice, and preemptively rectify the target server (such as upgrading firmware) according to the announcement, thereby mitigating the potential risk of the rectification issue. Moreover, the problem description and location are completed after the rectification announcement is generated, reducing redundant waste of human resources, shortening the rectification cycle, and avoiding prolonged impact on business operations and availability.
[0099] The rectification announcement information can be the address information of the rectification announcement, which allows users to obtain the complete information of the rectification announcement based on the address information, while effectively reducing the memory occupied by the rectification announcement information. In addition, the rectification announcement information can also be other types of rectification announcement information, and this application does not make specific limitations.
[0100] In one possible implementation, the risk alert carries the configuration information of the target server involved in the rectification issue (e.g., the target server model, the target server firmware version, etc.), allowing users to understand the configuration information of potential risks of fault issues in advance, so as to take targeted rectification measures in the future.
[0101] In one possible implementation, the risk alert carries the risk level of the rectification issue involving the target server. While alerting the user to the potential risk of a rectification issue on the target server, it also indicates the expected impact of such an issue on the server. This allows the user to formulate a rectification strategy for the target server based on the risk level. For example, risk levels include Level 1, Level 2, and Level 3. If the risk alert indicates a Level 1 risk issue, the user immediately rectifys the target server to prevent significant impact from the rectification issue. In another possible implementation, after identifying a target server with a rectification issue, the method may further include step S208, which is an optional step.
[0102] S208. Based on the rectification issues involved in the target server, rectify the target server.
[0103] Before proceeding with S208, the first server needs to be authorized to perform rectification. Only after the first server is authorized to perform rectification can it implement rectification measures on the target server based on the rectification issues involved.
[0104] For example, a pop-up window can be displayed to the user through the first server. The pop-up window includes an option to allow the first server to rectify the target server. In response to the user's option to allow the first server to rectify the target server, the first server is authorized to rectify the target server.
[0105] It should be noted that the authorization to rectify the first server can be granted permanently in advance, that is, after the first server is authorized to rectify the target server, the authorization status of the first server is maintained. Alternatively, the authorization can be re-granted before the first server is rectified, which can also be called temporary authorization. This application does not make specific limitations.
[0106] In one possible implementation, the rectification warning database in the first server also includes server rectification measures corresponding to the rectification issues. These server measures are rectification actions taken by a server to rectify other servers; that is, one server implements rectification measures on another server, thereby rectifying that server. The first server can search the rectification warning database for the server rectification measures corresponding to the rectification issues involving the target server, and then rectify the target server based on these measures. For example, the target server's configuration can be modified using IPMI or Redfish commands.
[0107] It should be noted that the role of the first server in rectifying the target server in S208 is generally to delay the occurrence of problems (failures). For example, before the target server is rectified, there is a potential risk of problem A occurring. If the target server continues to operate with the current configuration, it is expected that problem A will occur within three months. After the first server rectifys the target server, it is expected that problem A will not occur within two years. The rectification of the target server by the first server may not completely eliminate the potential risk of problem A occurring, but it extends the time before problem A may occur, providing users with sufficient time for server upgrades / updates.
[0108] It's important to note that while the first server's modifications to the target server generally delay the onset of problems, this is because these modifications typically involve configuration adjustments. For example, addressing potential risks of server overload causing malfunctions might be mitigated by modifying the target server's configuration using IPMI commands, thereby disabling certain functions. However, a complete solution requires upgrading the target server (e.g., firmware upgrade). Simply modifying the configuration isn't sufficient; for instance, the target server needs to connect to the internet to download the necessary upgrade package and then use it to complete the upgrade. However, the first server cannot control the target server's internet access, as this could compromise the security of the data stored on it. To ensure data security, users can individually authorize the target server to access the internet, or use authorized devices like USB drives to copy the upgrade package to the target server, thus enabling the upgrade.
[0109] The following is a server fault early warning method provided to facilitate understanding of the practical examples in this application. (The following section combines...) Figures 3-6 For example, server A is the first server.
[0110] Firstly, as Figure 3 The scenario shown includes servers A, B, C, D, and E, a switch, and a router. Servers A, B, C, and D are directly connected to the switch, while server E is connected to the router and then to the switch via the router. Using a ping program, the second servers with IP addresses in the same network segment as server A are identified as servers B, C, and D (e.g., ...). Figure 4 As shown, servers A, B, C, and D are on the same network segment.
[0111] After the user authorizes server A to obtain information about the second server, server A obtains the vendor identifiers of servers B, C, and D through the IPMI interface (where the obtained vendor identifiers of server B, server C, and server D are 'a', 'd', and server A itself is 'a'). Based on the comparison between the obtained vendor identifiers and server A's vendor identifier, it determines the target server among the second servers (servers B, C, and D) that shares the same vendor as server A. The target servers include: server B, server C (e.g., ...). Figure 5 As shown, servers A, B, and C belong to the same manufacturer.
[0112] Server A obtains the configuration information of servers B and C, compares it with the rectification warning database, and determines whether servers B and C are involved in rectification issues. The rectification warning database is... Figure 5 Taking the rectification warning database shown as an example, the rectification issue is issue Q. The server versions involved in issue Q include: version b1, version b2, and version c1, and the corresponding processor rectification measure for issue Q is processor rectification measure q. If the obtained version of server B is version b3 and the version of server C is version c1, then it is determined that server C is involved in the rectification issue, while server B is not involved in the rectification issue.
[0113] like Figure 6 As shown, server A generates a risk alert based on server C. The risk alert is as follows: Figure 6 As shown, the risk alert carries the device IP of server C and the problem Q involved with server C. Server A sends this risk alert to servers B and C to alert the user that server C has a potential risk of encountering problem Q.
[0114] After server A sends the risk alert to servers B and C, and the user authorizes server A to rectify the server, server A implements server rectification measure q on server B based on the server rectification measure q corresponding to problem Q in the rectification warning database.
[0115] To facilitate understanding of the server fault early warning method provided in the embodiments of this application, Figures 3-6 Based on, combined Figure 7 The diagram illustrates a server fault early warning method, illustrated with an example. The first server is server A, and the server configuration information uses the server version as an example. The rectification and early warning database is... Figure 5 The rectification and early warning database shown is an example.
[0116] Server A uses the ping program to ping the network segment IP of server A to determine the second servers, including server B, server C, and server D.
[0117] Determine whether the user has authorized server A to obtain information about the second server. If the user has authorized the server to obtain information about the second server, server A will obtain the vendor identifiers of servers B, C, and D through the IPMI interface; if the user has not authorized the server to obtain information about the second server, server A will wait for the user to authorize server A to obtain information about the second server before obtaining the vendor identifiers of servers B, C, and D through the IPMI interface.
[0118] Based on the vendor identifiers of servers B, C, and D obtained, server A determines that the target servers include: server B and server C.
[0119] Server A obtains the configuration information of Server B and Server C through the IPMI interface.
[0120] Server A compares the configuration information of servers B and C obtained with the rectification warning database to determine whether server B and / or server C are involved in rectification issues. It is determined that server C is involved in rectification issues.
[0121] Server A generates a risk alert based on server C. The risk alert is based on... Figure 6 Taking the risk alarm shown as an example, the risk alarm carries the device IP of server C and the problem Q involved in server C.
[0122] Server A sends a risk alert to servers B and C to inform the user that server C has a potential risk of encountering problem Q.
[0123] Determine whether the user has authorized server A to rectify server C. If the user has authorized server A to rectify server C, then server A will implement server rectification measure q on server C according to the server rectification measure q corresponding to problem Q in the rectification warning database. If the user has not authorized server A to rectify server C, then wait for the user to authorize server A to rectify server C, and then server A will implement server rectification measure q on server C according to the server rectification measure q corresponding to problem Q in the rectification warning database. This completes the rectification of server C by server A.
[0124] This application provides a server fault early warning method, including: identifying a second server located in the same network segment as a first server, and identifying a target server belonging to the same company as the first server among the second servers; obtaining the configuration information of the target server; comparing the obtained configuration information of the target server with a rectification early warning database to determine whether the target server is involved in a rectification issue; when the target server is involved in a rectification issue, generating a risk alarm based on the target server involved in the rectification issue; and sending the risk alarm to the target server to alert the user that the target server has a potential risk of encountering the rectification issue. This eliminates the need to wait for the target server to experience a fault (problem) before responding passively, thus solving the lag caused by passively responding to faults and providing timely early warning of potential risks of server problems, thereby preventing potential server problems in advance.
[0125] Furthermore, the target server can utilize the rectification announcement information carried in the risk alert to allow users to obtain the complete rectification announcement for the rectification issue. This enables users to upgrade / rectify the target server in advance based on the rectification announcement before the issue occurs, thereby preventing the rectification issue from arising and avoiding disruption to business operations and availability due to problems with the target server.
[0126] Furthermore, the rectification warning database also includes server rectification measures corresponding to rectification issues. The first server can determine the server rectification measures corresponding to the rectification issues involved in the target server from the rectification warning database. The first server uses the corresponding server rectification measures to rectify the target server, thereby extending the time when the target server may encounter rectification issues and providing users with sufficient time for server upgrades / updates.
[0127] Furthermore, before the first server can obtain configuration information from other servers or rectify other servers, it needs to be authorized to ensure the security of the data on the server.
[0128] Example 2:
[0129] The following is combined with Figure 8 This application provides a detailed description of a server fault early warning method. The method is applied to a first server.
[0130] It should be noted that the first server can be the latest version of the server in the server cluster, or it can be any server in the server cluster; this application does not impose any specific limitations. Furthermore, having the first server be the latest version of the server in the server cluster allows for a more accurate and comprehensive identification and early warning of servers in the server cluster that are at risk of failure.
[0131] In one possible implementation, the server fault early warning method provided in this application embodiment has specific triggering conditions, and the server fault early warning method provided in this application embodiment is implemented in response to the triggering conditions. The specific triggering conditions may be: a server in the server cluster experiences a fault and, after rectification, that server is designated as the first server, triggering the server fault early warning method; or the server cluster pre-sets a periodic time to periodically trigger the first server to execute the server fault early warning method; or the server cluster adds a new version of the server, triggering the execution of the server fault early warning method. The triggering conditions can be set according to actual conditions, and this application does not impose specific limitations.
[0132] S801. Based on preset rules, determine the target server corresponding to the first server.
[0133] The preset rules are set in advance based on the relevant information of the server cluster and the server itself.
[0134] For example, the preset rule could be that the target server and the first server are on the same network segment, and that the target server and the first server belong to the same manufacturer. Alternatively, the preset rule could also be that the target server and the first server are on the same network segment, or that the target server and the first server belong to the same manufacturer, etc., and this application does not impose specific limitations. Optionally, the preset rule could also be based on a target server input or selected by the user.
[0135] In one possible implementation, the preset rule is that the target server and the first server are located on the same network segment, and that the target server and the first server belong to the same vendor. It should be noted that the number of target servers can be one or more, depending on the actual situation; this application does not impose a specific limitation.
[0136] The fact that the target server and the first server are on the same network segment indicates that they communicate directly without the need for routers / switches, ensuring the security of internal server data and preventing data leakage due to router / switch forwarding. However, since the target server and the first server belong to the same manufacturer, differences in hardware configuration, firmware configuration, and hardware connections between servers from different manufacturers lead to incompatibility in terms of potential problems, problem localization, and causes, thus hindering the implementation of the server fault early warning method provided in this application.
[0137] Specifically, identify one or more second servers located in the same network segment as the first server; obtain the vendor identifier of one or more second servers; and based on the vendor identifier of one or more second servers, identify the target server among the one or more second servers that belongs to the same vendor as the first server.
[0138] For example, the first server uses the ping program to determine one or more second servers that are on the same network segment as the first server.
[0139] For example, the first server obtains the vendor identifier of one or more second servers through the IPMI interface.
[0140] S802. Obtain the configuration information of the target server.
[0141] The target server's configuration information includes: the target server's model and firmware version (e.g., BMC, BIOS, CPLD, etc.).
[0142] Specifically, the first server obtains the target server's configuration information through the IPMI interface.
[0143] S803. Based on the configuration information of the target server, use the rectification early warning database to determine whether there are any servers in the target server that need to be rectified.
[0144] The rectification early warning database includes rectification issues and the corresponding server configuration information.
[0145] Specifically, based on the rectification and early warning database, it is determined whether the database contains configuration information for the target server. If the database contains such information, it is determined that there is a server among the target servers that needs rectification. If the database does not contain such information, it is determined that there is no server among the target servers that needs rectification. The presence of target server configuration information in the rectification and early warning database indicates that the target server is involved in a rectification issue; that is, there exists a target server involved in a rectification issue, which can also be referred to as a server awaiting rectification.
[0146] By using the rectification issues and corresponding configuration information stored in the rectification early warning database, the servers to be rectified can be identified. This simplifies the process of identifying servers to be rectified, shortens the time consumed in identifying them, and thus improves the timeliness of early warning of potential risks of rectification issues on servers.
[0147] S804. When it is determined that there are servers to be rectified among the target servers, a risk alarm is generated.
[0148] Specifically, a risk alert is generated based on the device identifier of the server to be rectified. The risk alert carries the device identifier of the server to be rectified. The risk alert explicitly carries the device identifier of the server to be rectified, so that users can clearly identify the server information that may have a rectification issue.
[0149] Furthermore, in one possible implementation, the rectification issues corresponding to the server to be rectified are identified based on the rectification warning database; and risk alarms are generated based on the device identifier of the server to be rectified and the rectification issues involved in the server to be rectified.
[0150] Furthermore, in one possible implementation, the rectification early warning database also includes: rectification announcement information for the rectification issues. Based on the rectification early warning database, the rectification issues corresponding to the servers to be rectified are determined; based on the rectification early warning database, the rectification announcement information for the rectification issues corresponding to the servers to be rectified is determined; based on the device identifier of the server to be rectified, the corresponding rectification issue, and the rectification announcement information, a risk alarm is generated. At this time, the risk alarm carries the device identifier of the server to be rectified, the corresponding rectification issue, and the rectification announcement information.
[0151] It should be noted that when it is determined that there are no servers to be rectified in the target server, it means that there is no risk of historical failures in the current server cluster. In this case, the event that there are no servers to be rectified in the target server is recorded in the system event log.
[0152] S805: Send a risk alert to the target server to notify the user that the server to be rectified has potential risks.
[0153] In one possible implementation, risk alerts are sent to all target servers. Based on the device identifier of the server to be rectified carried in the risk alert, the user is alerted that the server to be rectified has potential risks. By sending risk alerts to all target servers, users can clearly identify servers with potential rectification issues based on the device identifier. At the same time, the risk alert also serves as a notification to other servers, meaning that users can proactively conduct a secondary screening for fault risks based on the device identifier of the server to be rectified.
[0154] In one possible implementation, the risk alarm is sent to the server to be rectified based on the device identifier carried in the risk alarm, thus alerting the user to the potential risks associated with the server. This provides accurate alerts about the potential risks of rectification issues on the server to be rectified, eliminating the need for the user to identify the server based on the device identifier.
[0155] Specifically, risk alerts include the device identifier of the server to be rectified and the rectification issues involved. While alerting users to potential risks associated with the server to be rectified, the alerts also clearly provide detailed information about the possible rectification issues.
[0156] Specifically, risk alerts include the device identifier of the server to be rectified, the corresponding rectification issue, and rectification announcement information. While clearly informing users of the potential risks associated with the service requiring rectification, and providing detailed information about the issue, the risk alert also includes the corresponding rectification announcement. Users can proactively respond to this announcement, obtain the relevant rectification notice, and preemptively rectify the target server (such as upgrading firmware) according to the announcement, thereby mitigating the potential risks of rectification issues. Furthermore, the problem description and location are completed after the rectification announcement is generated, reducing redundant waste of human resources, shortening the rectification cycle, and avoiding prolonged impacts on business operations and availability.
[0157] In one possible implementation, the rectification warning database also includes server rectification measures corresponding to rectification issues. Server measures are rectification measures implemented by a server as the executing entity for rectifying other servers; that is, one server performs rectification measures on another server, thus achieving rectification of that server. Following S805, the database further includes: determining the rectification issues corresponding to the servers to be rectified based on the rectification warning database; obtaining the server rectification measures corresponding to the rectification issues based on the rectification warning database; and using the corresponding server rectification measures to rectify the servers to be rectified. This can effectively extend the time before rectification issues occur on the servers to be rectified, providing users with sufficient time for server upgrades / updates.
[0158] This application provides a server fault early warning method, comprising: determining a target server corresponding to a first server based on preset rules; obtaining configuration information of the target server; determining, based on the configuration information of the target server and using a rectification early warning database, whether there is a server to be rectified among the target servers; generating a risk alarm when it is determined that there is a server to be rectified among the target servers; and sending the risk alarm to the target server to alert the user that the server to be rectified has potential risks. This method eliminates the need to wait for the target server to experience a fault (problem) before responding passively, thus solving the lag caused by passively responding to faults and providing timely early warning of potential risks to the server, thereby preventing potential server problems in advance.
[0159] Furthermore, risk alerts can carry rectification announcement information corresponding to the rectification issues of the servers to be rectified. This allows users to obtain the complete rectification announcement for the rectification issue based on the rectification announcement information, enabling them to upgrade / rectify the target server in advance based on the rectification announcement before the issue occurs, thereby avoiding the occurrence of rectification issues and preventing the operation and availability of services from being affected by problems with the target server.
[0160] Example 3:
[0161] The following is combined with Figure 9 This application provides a detailed description of a server fault early warning device.
[0162] Server determination module 901 is used to determine the target server corresponding to the first server based on preset rules;
[0163] Configuration information acquisition module 902 is used to acquire configuration information of the target server;
[0164] The server judgment module 903 is used to determine whether there are servers to be rectified in the target server based on the configuration information of the target server and using the rectification warning database; the rectification warning database includes rectification issues and the server configuration information corresponding to the rectification issues;
[0165] The risk alarm generation module 904 is used to generate a risk alarm when it is determined that there is a server to be rectified in the target server; the risk alarm sending module 905 is used to send the risk alarm to the target server to alert the user that there is a potential risk in the server to be rectified.
[0166] Optionally, the server determination module 901 includes: a second server determination module; a vendor identifier acquisition module; and a target server determination module. The second server determination module is used to determine one or more second servers located in the same network segment as the first server; the vendor identifier acquisition module is used to acquire the vendor identifier of the one or more second servers; and the target server determination module is used to determine, based on the vendor identifier of the one or more second servers, a target server belonging to the same vendor as the first server among the one or more second servers.
[0167] Optionally, the server judgment module 903 is specifically used to determine whether the rectification warning database contains configuration information of the target server; if the rectification warning database contains configuration information of the target server, it is determined that there is a server to be rectified among the target servers; if the rectification warning database does not contain configuration information of the target server, it is determined that there is no server to be rectified among the target servers.
[0168] Optionally, the risk alarm generation module 904 is specifically used to generate a risk alarm based on the device identifier of the server to be rectified; the risk alarm carries the device identifier of the server to be rectified.
[0169] Optionally, the risk alarm sending module 905 is specifically used to send risk alarms to all target servers, so as to alert the user that the server to be rectified has potential risks based on the device identifier of the server to be rectified carried in the risk alarm.
[0170] Optionally, the risk alarm sending module 905 is specifically used to send the risk alarm to the server to be rectified based on the device identifier of the server to be rectified carried in the risk alarm, so as to alert the user that there is a potential risk in the server to be rectified.
[0171] Optionally, the device further includes: a rectification issue determination module, used to determine the rectification issues corresponding to the server to be rectified based on the rectification early warning database; and a risk alarm generation module 904, specifically used to generate a risk alarm based on the device identifier of the server to be rectified and the rectification issues involved in the server to be rectified; the risk alarm carries the device identifier of the server to be rectified and the rectification issues involved.
[0172] Optionally, the rectification early warning database also includes: rectification announcement information for rectification issues. The device also includes: a rectification announcement determination module. Specifically, the rectification issue determination module is used to determine the rectification issue corresponding to the server to be rectified based on the rectification early warning database; the rectification announcement determination module is used to determine the rectification announcement information for the rectification issue corresponding to the server to be rectified based on the rectification early warning database; and the risk alarm generation module 904 is specifically used to generate a risk alarm based on the device identifier of the server to be rectified, the corresponding rectification issue, and the rectification announcement information; the risk alarm carries the device identifier of the server to be rectified, the corresponding rectification issue, and the rectification announcement information.
[0173] Optionally, the rectification early warning database also includes server rectification measures corresponding to the rectification issues. The device further includes a rectification measure determination module and a server rectification module. The rectification issue determination module is used to determine the rectification issues corresponding to the servers to be rectified based on the rectification early warning database; the rectification measure determination module is used to obtain the server rectification measures corresponding to the rectification issues based on the rectification early warning database; and the server rectification module is used to rectify the servers to be rectified using the corresponding server rectification measures.
[0174] This application provides a server fault early warning device, comprising: a server determination module 901, used to determine a target server corresponding to a first server based on preset rules; a configuration information acquisition module 902, used to acquire the configuration information of the target server; a server judgment module 903, used to determine whether there is a server to be rectified among the target servers based on the configuration information of the target server and using a rectification early warning database; the rectification early warning database includes rectification issues and server configuration information corresponding to the rectification issues; a risk alarm generation module 904, used to generate a risk alarm when it is determined that there is a server to be rectified among the target servers; and a risk alarm sending module 905, used to send the risk alarm to the target server to alert the user that there is a potential risk in the server to be rectified. This device eliminates the need to wait for the target server to experience a fault (problem) before responding passively, solving the lag caused by passively responding to faults, and providing timely early warning of potential risks to the server, thus preventing potential server problems in advance.
[0175] This application also provides a computer device, which includes a processor and a memory. The processor is connected to the memory, and the memory stores computer execution instructions. When the processor executes the computer execution instructions, it implements the server fault early warning method in the above embodiments.
[0176] This application also provides a computer-readable storage medium storing a computer program that, when run on a computer, causes the computer to execute the server fault warning method described in the above embodiments.
[0177] For explanations of the relevant content and descriptions of the beneficial effects in any of the computer-readable storage media provided above, please refer to the corresponding embodiments described above, which will not be repeated here.
[0178] This application also provides a chip. This chip integrates a control circuit for implementing the functions of the aforementioned server and one or more ports. Optionally, the functions supported by this chip can be referred to above, and will not be repeated here. Those skilled in the art will understand that all or part of the steps of the above embodiments can be implemented by a program instructing related hardware. The program can be stored in a computer-readable storage medium. The aforementioned storage medium can be a read-only memory, random access memory, etc. The aforementioned processing unit or processor can be a central processing unit, a general-purpose processor, an application-specific integrated circuit (ASIC), a microprocessor (digital signal processor, DSP), a field-programmable gate array (FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof.
[0179] This application also provides a computer program product containing instructions that, when executed on a computer, cause the computer to perform any of the methods described in the above embodiments. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the flow or function according to the embodiments of this application is generated. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, computer instructions may be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access or may include one or more data storage devices such as servers or data centers that can be integrated with the medium. The available medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., SSD), etc.
[0180] It should be noted that the devices for storing computer instructions or computer programs provided in the embodiments of this application, such as but not limited to the memory, computer-readable storage medium and communication chip, are all non-transitory.
[0181] In the above embodiments, implementation can be achieved, in whole or in part, through software, hardware, firmware, or any combination thereof. When implemented using software programs, implementation can be, in whole or in part, in the form of a computer program product. This computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the flow or function according to the embodiments of this application is generated. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, computer instructions can be transmitted from one website, computer, server, or data center to another via wired (e.g., coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium accessible to a computer or a data storage device containing one or more servers, data centers, etc., that can be integrated with the medium. The available media can be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid-state disks, SSDs).
[0182] Although this application has been described herein in conjunction with various embodiments, those skilled in the art, by reviewing the accompanying drawings, the disclosure, and the appended claims, will understand and implement other variations of the disclosed embodiments in carrying out the claimed application. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude multiple instances. A single processor or other unit can implement several functions listed in the claims. While different dependent claims may recite certain measures, this does not mean that these measures cannot be combined to produce good results.
[0183] Although this application has been described in conjunction with specific features and embodiments, it is obvious that various modifications and combinations can be made thereto without departing from the spirit and scope of this application. Accordingly, this specification and drawings are merely exemplary illustrations of this application as defined by the appended claims, and are considered to cover any and all modifications, variations, combinations, or equivalents within the scope of this application. Clearly, those skilled in the art can make various alterations and modifications to this application without departing from the spirit and scope of this application. Thus, if such modifications and modifications of this application fall within the scope of the claims of this application and their equivalents, this application is also intended to include such modifications and modifications.
Claims
1. A server fault early warning method, characterized in that, Applied to a first server, the method includes: Based on preset rules, the target server corresponding to the first server is determined; Obtain the configuration information of the target server; Based on the configuration information of the target server, the rectification early warning database is used to determine whether there are any servers in the target server that need to be rectified; the rectification early warning database includes rectification issues and server configuration information corresponding to the rectification issues; When it is determined that there are servers requiring rectification among the target servers, a risk alarm is generated; The risk alert is sent to the target server to alert the user that the server to be rectified has potential risks. The preset rules include: the target server and the first server are in the same network segment, and the target server and the first server belong to the same manufacturer; The rectification early warning database also includes: server rectification measures corresponding to the rectification problem; the method also includes: Based on the rectification early warning database, the rectification issues corresponding to the servers to be rectified are determined; Based on the rectification and early warning database, obtain the server rectification measures corresponding to the rectification issues; The corresponding server rectification measures are used to rectify the server to be rectified.
2. The method according to claim 1, characterized in that, The step of determining whether there are servers requiring rectification among the target servers based on the configuration information of the target server and using the rectification early warning database includes: Based on the rectification and early warning database, determine whether the target server's configuration information exists in the rectification and early warning database; If the configuration information of the target server exists in the rectification and early warning database, it is determined that there is a server to be rectified among the target servers; If the configuration information of the target server is not found in the rectification and early warning database, it is determined that there is no server to be rectified among the target servers.
3. The method according to claim 1, characterized in that, The generation of risk alerts includes: A risk alarm is generated based on the device identifier of the server to be rectified; the risk alarm carries the device identifier of the server to be rectified.
4. The method according to claim 3, characterized in that, Sending the risk alert to the target server includes: The risk alert is sent to all target servers to alert the user of the potential risk of the server to be rectified, based on the device identifier of the server to be rectified carried in the risk alert.
5. The method according to claim 3, characterized in that, Sending the risk alert to the target server includes: Based on the device identifier of the server to be rectified carried in the risk alarm, the risk alarm is sent to the server to be rectified to alert the user that the server to be rectified has potential risks.
6. The method according to claim 1, characterized in that, The generation of risk alerts includes: Based on the rectification early warning database, the rectification issues corresponding to the servers to be rectified are determined; A risk alarm is generated based on the device identifier of the server to be rectified and the rectification issues involved in the server to be rectified; the risk alarm carries the device identifier of the server to be rectified and the rectification issues involved.
7. The method according to claim 1, characterized in that, The rectification early warning database also includes: rectification announcement information for the rectification issues; The generation of risk alerts includes: Based on the rectification early warning database, the rectification issues corresponding to the servers to be rectified are determined; Based on the rectification early warning database, determine the rectification announcement information for the rectification issues corresponding to the servers to be rectified; A risk alarm is generated based on the device identifier of the server to be rectified, the corresponding rectification issue, and the rectification announcement information; the risk alarm carries the device identifier of the server to be rectified, the corresponding rectification issue, and the rectification announcement information.
8. The method according to any one of claims 1-7, characterized in that, The step of determining the target server corresponding to the first server based on preset rules includes: Based on the ping procedure, identify one or more second servers that are in the same network segment as the first server; Obtain the vendor identifier of the one or more second servers through the IPMI interface; Based on the vendor identifier of the one or more second servers, determine the target server among the one or more second servers that belongs to the same vendor as the first server.
9. A computing device, characterized in that, include: A processor, and a memory communicatively connected to the processor; The memory is used to store computer-executed instructions; The processor is used to execute computer execution instructions stored in the memory to implement the server fault early warning method according to any one of claims 1-8.