A method, system, apparatus, and medium for avoiding BMC restarts in a restart test

By setting up efficient prevention and protection mechanisms for the BMC and monitoring data and communication flows in real time, the problem of hardware monitoring data loss caused by BMC restarts is solved, ensuring the stable operation of the server in extreme environments.

CN116340057BActive Publication Date: 2026-06-26INSPUR SUZHOU INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
INSPUR SUZHOU INTELLIGENT TECH CO LTD
Filing Date
2023-03-31
Publication Date
2026-06-26

Smart Images

  • Figure CN116340057B_ABST
    Figure CN116340057B_ABST
Patent Text Reader

Abstract

The application provides a method, system, device and medium for avoiding BMC restart in a restart test, the method comprising: starting a restart test under a current system of an AMD server; checking whether there is abnormal information related to the BMC; if there is abnormal information, pulling a potential signal for controlling the BMC restart, simultaneously obtaining restart information of the system through a CPLD, determining whether the BMC performs a restart operation by comparing the two types of information; monitoring a data flow of a data pipeline between the BMC and BIOS in real time, and controlling an acquisition operation of the data flow according to a first control mechanism; monitoring an in-band communication data flow of the BMC in real time, and controlling the in-band communication data flow according to a second control mechanism. The application sets an efficient prevention and protection mechanism for the BMC, and ensures that the AMD server maintains a high-efficiency and stable running state of the BMC when operating in an extreme use environment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and more specifically to a method, system, apparatus, and medium for preventing BMC restart during a restart test. Background Technology

[0002] AMD servers are highly favored by customers and consumers in the market due to their superior performance and excellent price-performance ratio. Servers currently using the ROME series CPU architecture are particularly popular for their high-speed processing capabilities. Servers, as the fundamental carriers of information systems, play a crucial role, especially in the massive storage of information. To ensure stable server operation, real-time monitoring of critical server hardware and its usage is necessary, a task typically accomplished using a Hardware Monitoring and Control System (BMC). Therefore, the BMC plays a vital role in monitoring and using critical hardware, making it especially important for developers to ensure the stable operation of the hardware monitoring firmware (BMC).

[0003] However, during current system stability testing, continuous system restarts are required, and BMC restarts frequently occur during this process. A BMC restart can lead to the loss of customer hardware monitoring data. If the server malfunctions at this time, abnormal hardware cannot be quickly located or diagnosed, posing an irreparable risk to the customer's usage.

[0004] In addition, during the continuous restart testing process, the BIOS will also continuously synchronize important hardware or configuration information to the BMC. This will cause the BMC to receive information constantly. If the BIOS or server system crashes, it may also cause the BMC to restart, thereby losing the monitoring of important hardware. Summary of the Invention

[0005] To address the above issues, the present invention aims to provide a method, system, device, and medium for preventing BMC restarts during restart testing. By setting up an efficient prevention and protection mechanism for the BMC, it ensures that the AMD server maintains a highly efficient and stable BMC operation state when operating under extreme conditions.

[0006] To achieve the above objectives, the present invention employs the following technical solution:

[0007] In a first aspect, the present invention discloses a method for avoiding BMC restart during a restart test, comprising: powering on the AMD server and initiating a restart test under the current system;

[0008] Check if there are any BMC-related anomalies in the current system;

[0009] If abnormal information is found, the potential signal used to control the BMC restart is pulled as the first reference information, and the system restart information is obtained through CPLD as the second reference information. By comparing the first reference information and the second reference information, it is determined whether the BMC should perform a restart operation.

[0010] Real-time monitoring of the data flow in the data pipeline between the BMC and BIOS, and control over the acquisition of the data flow according to the first control mechanism;

[0011] Monitor the in-band communication data stream of the BMC in real time and control the in-band communication data stream according to the second control mechanism.

[0012] Furthermore, checking whether there is any BMC-related abnormal information in the current system includes:

[0013] Run the main process through BMC, capture the main process, and check if any abnormal monitoring logs are generated;

[0014] Examine the potential signal that controls the BMC restart and determine whether there are any abnormal fluctuations in the voltage value of the potential signal.

[0015] Furthermore, the abnormal information related to the BMC includes:

[0016] Abnormal fluctuations were observed in the abnormal monitoring logs generated by the BMC main process and in the potential signals controlling the BMC restart.

[0017] Furthermore, determining whether the BMC should perform a restart operation by comparing the first reference information and the second reference information includes:

[0018] If both the first reference information and the second reference information are signals that control the BMC to perform a restart operation, then the BMC will perform a restart operation.

[0019] Furthermore, the first control mechanism includes:

[0020] If the acquisition time of the current data stream exceeds the time threshold, it is determined to be an acquisition anomaly and the data stream is deleted, and the next data stream is acquired.

[0021] If the acquisition time of the next data stream exceeds the time threshold, it is determined to be an acquisition anomaly and the data interaction is stopped for a preset time before the data stream acquisition operation continues.

[0022] If the number of abnormal acquisitions exceeds the alarm threshold, the data stream acquisition operation will be stopped and a prompt message will be reported.

[0023] Furthermore, the second control mechanism includes:

[0024] If the in-band communication data stream of the BMC exceeds the preset value, then the in-band communication of the BMC will be rate-limited.

[0025] If the data obtained by the BMC's in-band communication is the same twice in a row, the corresponding data will be deleted.

[0026] If the data acquired by the BMC's in-band communication is the same twenty times in a row, the data transmission speed of the BMC's in-band communication will be reduced.

[0027] Furthermore, the time threshold is twenty seconds, and the alarm threshold is ten times.

[0028] Secondly, the present invention also discloses a system for avoiding BMC restart during restart testing, comprising:

[0029] The test startup module is used to power on the AMD server and perform a restart test under the current system.

[0030] The information acquisition module is used to check whether there are any BMC-related abnormal information in the current system;

[0031] The restart determination module is used to pull the potential signal used to control the restart of BMC as the first reference information when there is abnormal information, and at the same time obtain the system restart information through CPLD as the second reference information. By comparing the first reference information and the second reference information, it is determined whether BMC should perform a restart operation.

[0032] The data transmission control module is used to monitor the data flow of the data pipeline between the BMC and BIOS in real time, and to control the acquisition operation of the data flow according to the first control mechanism.

[0033] The in-band communication control module is used to monitor the in-band communication data stream of the BMC in real time and control the in-band communication data stream according to the second control mechanism.

[0034] Thirdly, the present invention also discloses a device for preventing BMC restart during a restart test, comprising:

[0035] The memory is used to store the program that prevents the BMC from restarting during the restart test;

[0036] A processor, when executing a program to avoid BMC restart during the restart test, implements the steps of the method for avoiding BMC restart during the restart test as described in any of the above.

[0037] Fourthly, the present invention also discloses a readable storage medium storing a program for avoiding BMC restart during a restart test, wherein when the program for avoiding BMC restart during a restart test is executed by a processor, it implements the steps of the method for avoiding BMC restart during a restart test as described in any of the above claims.

[0038] Compared with existing technologies, the advantages of this invention are as follows: This invention discloses a method, system, device, and readable storage medium to prevent BMC restarts during restart testing. By improving the BMC's own anti-interference capability and making the data path for the BMC to receive abnormal information from the operating system less prone to blockage, it avoids the BMC from performing a restart operation due to data transmission anomalies. Through these two methods, corresponding protection measures are provided for the BMC during use in extreme server environments.

[0039] This invention, by setting up an efficient prevention and protection mechanism for the BMC, can ensure that AMD servers maintain a high-efficiency and stable operating state when operating in extreme environments, thus helping customers obtain a better user experience for AMD CPU platform server services.

[0040] Therefore, it is evident that the present invention has outstanding substantive features and significant progress compared with the prior art, and the beneficial effects of its implementation are also obvious. Attached Figure Description

[0041] To more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are only embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on the provided drawings without creative effort.

[0042] Figure 1 This is a flowchart illustrating a specific embodiment of the present invention.

[0043] Figure 2 This is a system structure diagram of a specific embodiment of the present invention.

[0044] In the diagram, 1 is the test startup module; 2 is the information acquisition module; 3 is the restart determination module; 4 is the data transmission control module; and 5 is the in-band communication control module. Detailed Implementation

[0045] The core of this invention is to provide a method to avoid BMC restarts during restart testing. In related technologies, during system stability testing, continuous restart testing is required, during which BMC restarts frequently occur. Once a BMC restart occurs, it can lead to the loss of customer hardware monitoring data. If the server malfunctions at this time, abnormal hardware cannot be quickly located or diagnosed, posing an irreparable risk to the customer. Furthermore, during continuous restart testing, the BIOS constantly synchronizes important hardware or configuration information to the BMC. This causes the BMC to continuously receive information; if the BIOS or server system crashes, it may also lead to a BMC restart, resulting in the loss of monitoring of critical hardware.

[0046] The method for preventing BMC restart during the restart test provided by this invention first involves powering on the AMD server and initiating a restart test under the current system. At this time, it checks for any BMC-related anomalies in the current system. If anomalies are found, the potential signal used to control the BMC restart is retrieved as the first reference information, and the system restart information is obtained through the CPLD as the second reference information. By comparing the first and second reference information, it is determined whether the BMC should perform a restart operation. Then, the data flow in the data pipeline between the BMC and BIOS is monitored in real time, and the acquisition of the data flow is controlled according to the first control mechanism; simultaneously, the in-band communication data flow of the BMC is monitored in real time, and the in-band communication data flow is controlled according to the second control mechanism. Therefore, this invention, by setting up an efficient prevention and protection mechanism for the BMC, can ensure that the AMD server maintains a highly efficient and stable BMC operation under extreme operating environments.

[0047] To enable those skilled in the art to better understand the present invention, the invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. Obviously, the described embodiments are merely some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0048] See Figure 1 As shown, this invention discloses a method for avoiding BMC restart during restart testing, comprising the following steps:

[0049] S1: Power on the AMD server and start a restart test under the current system.

[0050] First, power on the AMD server and run it. Then, after the AMD server has finished booting up, continuously perform restart tests within the operating system.

[0051] S2: Check if there are any BMC-related abnormal information in the current system.

[0052] It should be noted that the abnormal information related to BMC in this method includes two types of information: one is the abnormal monitoring log generated by the BMC main process, and the other is the abnormal fluctuation of the potential signal that controls the restart of BMC.

[0053] Specifically, first, the main process of the BMC is run, and the process is captured to check for any abnormal monitoring logs. Then, the voltage signal controlling the BMC restart is examined to determine if there are any abnormal fluctuations in its voltage value. If abnormal monitoring logs are generated or the voltage signal controlling the BMC restart shows abnormal fluctuations, it can be confirmed that there are BMC-related anomalies in the previous system.

[0054] S3: If abnormal information exists, pull the potential signal used to control the BMC restart as the first reference information, and at the same time obtain the system restart information through CPLD as the second reference information. By comparing the first reference information and the second reference information, determine whether the BMC should perform a restart operation.

[0055] Specifically, after obtaining the first reference information and the second reference information, if both the first reference information and the second reference information are signals that control the BMC to perform a restart operation, then the BMC performs a restart operation.

[0056] In a specific implementation, if abnormal information exists, the BMC retrieves the potential information of the corresponding set point. This potential information is used to control the BMC restart ground potential signal and maintain a stable operating state related to voltage and current. Based on this potential information identification, the system restart information obtained by the BMC through the CPLD also needs to be identified. That is, the contents of the two types of information are compared. Since the two types of information cannot simultaneously trigger and be determined to indicate a BMC restart operation within a certain time interval, if neither type of information is determined to indicate a BMC restart operation, then no restart operation is performed. Conversely, if both types of information are determined to indicate a BMC restart operation, then the BMC restart operation is performed. Through this mechanism, abnormal interference from system restarts on the BMC can be effectively eliminated.

[0057] S4: Monitors the data flow in the data pipeline between the BMC and BIOS in real time, and controls the acquisition operation of the data flow according to the first control mechanism.

[0058] Specifically, the system monitors the data flow in the data pipeline between the BMC and BIOS in real time. If the acquisition time of the current data flow exceeds twenty seconds, it is considered an acquisition anomaly, and the data flow is deleted before acquiring the next data flow. If the acquisition time of the next data flow exceeds twenty seconds, it is considered an acquisition anomaly, and data interaction is stopped for a preset time before resuming data flow acquisition. Furthermore, if the number of acquisition anomalies exceeds ten, the data flow acquisition operation is stopped, and a warning message is reported.

[0059] In a specific implementation, if the BIOS freezes and crashes at a certain interface, data interaction with the BMC will be stuck at a specific point. Prolonged waiting for pipeline information can create the illusion of congestion, causing the BMC to restart. To address this issue, this step adds data flow judgment to the information pipeline. It judges the acquisition time of the same data stream; if the acquisition time exceeds a set threshold (e.g., 20 seconds), it is considered abnormal, and the data is quickly deleted. Then, it continues to acquire the next series of data. If it still cannot be acquired within the specified time, an abnormality is determined, interaction is stopped, and interaction is attempted again after a period of time. When the number of acquisition attempts exceeds ten, data acquisition will cease, an abnormality is determined, and a prompt message is reported.

[0060] S5: Monitor the in-band communication data stream of BMC in real time and control the in-band communication data stream according to the second control mechanism.

[0061] Specifically, the BMC's in-band communication data stream is monitored in real time. If the BMC's in-band communication data stream exceeds a preset value, the in-band communication of the BMC is rate-limited. If the data obtained by the BMC's in-band communication is the same twice in a row, the corresponding data is deleted. If the data obtained by the BMC's in-band communication is the same twenty times in a row, the data transmission speed of the BMC's in-band communication is reduced.

[0062] In a specific implementation, server system restarts continuously impact BMC in-band communication; these constant in-band restarts can cause congestion in internal channels, as both in-band acquisition and BMC operation run on the IPMI main thread. To avoid this problem, this step first monitors the data stream used by the system to communicate with the BMC. This can be done by capturing packets over a certain period of time, then calculating the amount of captured data.

[0063] At this point, the size and frequency of in-band data exchange can be determined by comparing it with the size of a pre-defined interactive data stream.

[0064] If it can be determined that data exchange is too slow or too frequent, corresponding restrictions will be implemented, accepting only carriers of important data. Furthermore, each time data is acquired, it will be compared with the previously acquired data; if they are identical, the data will be promptly and completely deleted. If the same data appears twenty times consecutively, information will be filtered and acquired at a normal rate according to project requirements.

[0065] This embodiment provides a method to avoid BMC restarts during restart testing. By improving the BMC's own anti-interference capabilities and making the data path for the BMC to receive abnormal information from the operating system less prone to blockage, the BMC is prevented from performing a restart operation due to data transmission anomalies. These two methods provide corresponding protection measures for the BMC during extreme server environments, ensuring that AMD servers maintain efficient and stable operation under extreme usage conditions.

[0066] See Figure 2 As shown, the present invention also discloses a system for avoiding BMC restart during restart testing, comprising: a test startup module 1, an information acquisition module 2, a restart determination module 3, a data transmission control module 4, and an in-band communication control module 5.

[0067] Test startup module 1 is used to power on the AMD server and start a restart test under the current system.

[0068] In a specific implementation, test startup module 1 is specifically used to: perform a power-on and startup operation on the AMD server. After the AMD server has finished booting up, a restart test is continuously performed under the operating system.

[0069] Information acquisition module 2 is used to check whether there is any BMC-related abnormal information in the current system.

[0070] In a specific implementation, the information acquisition module 2 is specifically used to: capture the main process running through the BMC and check if any abnormal monitoring logs are generated; check the potential signal controlling the BMC restart and determine if the voltage value of the potential signal has abnormal fluctuations; if abnormal monitoring logs are generated or the potential signal controlling the BMC restart has abnormal fluctuations, it can be determined that there is BMC-related abnormal information in the current system.

[0071] The restart determination module 3 is used to pull the potential signal used to control the restart of the BMC as the first reference information when there is abnormal information, and at the same time obtain the system restart information through the CPLD as the second reference information. By comparing the first reference information and the second reference information, it is determined whether the BMC should perform a restart operation.

[0072] In a specific implementation, the restart determination module 3 is specifically used to: when abnormal information exists, pull the potential signal used to control the BMC restart as the first reference information, and simultaneously obtain the system restart information through the CPLD as the second reference information. After obtaining the first and second reference information, if both the first and second reference information are signals controlling the BMC to perform a restart operation, then the BMC performs the restart operation. Otherwise, the BMC restart operation is not performed.

[0073] The data transmission control module 4 is used to monitor the data flow of the data pipeline between the BMC and the BIOS in real time, and to control the acquisition operation of the data flow according to the first control mechanism.

[0074] In a specific implementation, the data transmission control module 4 is specifically used to: monitor the data flow of the data pipeline between the BMC and BIOS in real time; if the acquisition time of the current data flow exceeds twenty seconds, it is determined to be an acquisition anomaly, and the data flow is deleted, and the next data flow is acquired; if the acquisition time of the next data flow exceeds twenty seconds, it is determined to be an acquisition anomaly, and the data interaction is stopped for a preset time before continuing the data flow acquisition operation. If the number of acquisition anomalies exceeds ten, the data flow acquisition operation is stopped, and a prompt message is reported.

[0075] The in-band communication control module 5 is used to monitor the in-band communication data stream of the BMC in real time and control the in-band communication data stream according to the second control mechanism.

[0076] In a specific implementation, the in-band communication control module 5 is specifically used for: real-time monitoring of the in-band communication data stream of the BMC; if the traffic of the in-band communication data stream of the BMC exceeds a preset value, then performing a flow limiting operation on the in-band communication of the BMC; if the data obtained by the in-band communication of the BMC is the same twice in a row, then deleting the corresponding data; if the data obtained by the in-band communication of the BMC is the same twenty times in a row, then reducing the data transmission speed of the in-band communication of the BMC.

[0077] This embodiment provides a system that avoids BMC restart during restart testing. By setting up an efficient prevention and protection mechanism for the BMC, it can ensure that AMD servers maintain a high-efficiency and stable operating state when operating under extreme usage environments, helping AMD CPU platform server services to achieve a better user experience for customers.

[0078] The present invention also discloses an apparatus for avoiding BMC restart during a restart test, comprising a processor and a memory; wherein, when the processor executes the program for avoiding BMC restart during a restart test stored in the memory, it performs the following steps:

[0079] 1. Power on the AMD server and start a restart test under the current system.

[0080] 2. Check if there are any BMC-related anomalies in the current system.

[0081] 3. If abnormal information is found, the potential signal used to control the BMC restart is pulled as the first reference information, and the system restart information is obtained through CPLD as the second reference information. By comparing the first reference information and the second reference information, it is determined whether the BMC should perform a restart operation.

[0082] 4. Monitor the data flow of the data pipeline between the BMC and BIOS in real time, and control the acquisition operation of the data flow according to the first control mechanism.

[0083] 5. Monitor the in-band communication data stream of BMC in real time and control the in-band communication data stream according to the second control mechanism.

[0084] The device used to prevent BMC restart in the restart test provided in this embodiment may include, but is not limited to, smartphones, tablets, laptops, or desktop computers.

[0085] The processor may include one or more processing cores, such as a quad-core processor or an octa-core processor. The processor can be implemented using at least one hardware form of Digital Signal Processor (DSP), Field-Programmable Gate Array (FPGA), or Programmable Logic Array (PLA). The processor may also include a main processor and coprocessors. The main processor, also known as the Central Processing Unit (CPU), is used to process data in the wake-up state; the coprocessor is a low-power processor used to process data in the standby state. In some embodiments, the processor may integrate a Graphics Processing Unit (GPU), which is responsible for rendering and drawing the content to be displayed on the screen. In some embodiments, the processor may also include an Artificial Intelligence (AI) processor, which handles computational operations related to machine learning.

[0086] The memory may include one or more computer-readable storage media, which may be non-transitory. The memory may also include high-speed random access memory and non-volatile memory, such as one or more disk storage devices or flash memory devices. In this embodiment, the memory is used to store at least the following computer program, which, after being loaded and executed by the processor, is capable of implementing the relevant steps of the method for avoiding BMC restart in the restart test disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory may also include operating systems and data, and the storage method may be temporary or permanent storage. The operating system may include Windows, Unix, Linux, etc. The data may include, but is not limited to, the data involved in the above-mentioned method for avoiding BMC restart in the restart test.

[0087] In a specific implementation, when the processor executes the computer program stored in the memory, it can specifically perform the following steps: Powering on the AMD server and running it. After the AMD server has booted up, a restart test is continuously performed under the operating system.

[0088] In a specific implementation, when the processor executes the computer program stored in the memory, it can specifically perform the following steps: run the main process through the BMC, capture the main process, and check if any abnormal monitoring logs are generated. Check the potential signal controlling the BMC restart and determine if the voltage value of the potential signal has abnormal fluctuations. If abnormal monitoring logs are generated or the potential signal controlling the BMC restart has abnormal fluctuations, it can be determined that there is BMC-related abnormal information in the current system.

[0089] In a specific implementation, when the processor executes the computer program stored in the memory, it can specifically implement the following steps: When abnormal information exists, it retrieves a potential signal used to control the BMC restart as first reference information, and simultaneously obtains system restart information through the CPLD as second reference information. After obtaining the first and second reference information, if both the first and second reference information are signals controlling the BMC to perform a restart operation, then the BMC performs a restart operation. Otherwise, the BMC restart operation is not performed.

[0090] In a specific implementation, when the processor executes the computer program stored in the memory, it can specifically implement the following steps: When abnormal information exists, it retrieves a potential signal used to control the BMC restart as first reference information, and simultaneously obtains system restart information through the CPLD as second reference information. After obtaining the first and second reference information, if both the first and second reference information are signals controlling the BMC to perform a restart operation, then the BMC performs a restart operation. Otherwise, the BMC restart operation is not performed.

[0091] In a specific implementation, when the processor executes the computer program stored in the memory, it can specifically implement the following steps: real-time monitoring of the BMC's in-band communication data stream; if the BMC's in-band communication data stream exceeds a preset value, then performing a flow-limiting operation on the BMC's in-band communication; if the data acquired by the BMC's in-band communication is the same twice consecutively, then deleting the corresponding data; if the data acquired by the BMC's in-band communication is the same twenty times consecutively, then reducing the data transmission speed of the BMC's in-band communication.

[0092] Furthermore, the device for preventing BMC restart during the restart test in this embodiment may further include:

[0093] The input interface is used to acquire programs imported from external sources to prevent BMC restarts during restart tests, and saves these programs to the memory. It can also acquire various instructions and parameters transmitted from external terminal devices and transmit them to the processor, allowing the processor to perform corresponding processing using these instructions and parameters. In this embodiment, the input interface may specifically include, but is not limited to, a USB interface, a serial interface, a voice input interface, a fingerprint input interface, and a hard disk read interface.

[0094] An output interface is used to output various data generated by the processor to connected terminal devices, so that other terminal devices connected to the output interface can obtain the various data generated by the processor. In this embodiment, the output interface may include, but is not limited to, a USB interface, a serial interface, etc.

[0095] A communication unit is used to establish a remote communication connection between the device for preventing BMC restart during restart testing and an external server, so that the device for preventing BMC restart during restart testing can mount the image file to the external server. In this embodiment, the communication unit may specifically include, but is not limited to, a remote communication unit based on wireless communication technology or wired communication technology.

[0096] The keyboard is used to acquire various parameter data or commands input by the user through real-time keystrokes.

[0097] The monitor is used to display relevant information in real time regarding the process of locating a short circuit in the server's power supply line.

[0098] A mouse can be used to assist users in inputting data and simplifying user operations.

[0099] This embodiment provides a device to prevent BMC restarts during restart testing. By improving the BMC's own anti-interference capability and making the data path for the BMC to receive abnormal information from the operating system less prone to blockage, the device avoids the BMC from performing a restart operation due to data transmission anomalies. Through these two methods, corresponding protection measures are provided for the BMC during use in extreme server environments, ensuring that AMD servers maintain a high-efficiency and stable operating state when operating under extreme conditions.

[0100] This invention also discloses a readable storage medium, which includes random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable hard disk, CD-ROM, or any other form of storage medium known in the art. The readable storage medium stores a program to prevent BMC restart during a restart test, which, when executed by a processor, performs the following steps:

[0101] 1. Power on the AMD server and start a restart test under the current system.

[0102] 2. Check if there are any BMC-related abnormal messages in the current system.

[0103] 3. If abnormal information is found, the potential signal used to control the BMC restart is pulled as the first reference information, and the system restart information is obtained through CPLD as the second reference information. By comparing the first reference information and the second reference information, it is determined whether the BMC should perform a restart operation.

[0104] 4. Monitor the data flow of the data pipeline between the BMC and BIOS in real time, and control the acquisition operation of the data flow according to the first control mechanism.

[0105] 5. Monitor the in-band communication data stream of BMC in real time and control the in-band communication data stream according to the second control mechanism.

[0106] In summary, this invention ensures that AMD servers maintain a high-efficiency and stable BMC operation when operating under extreme conditions by setting up efficient prevention and protection mechanisms for the BMC.

[0107] The various embodiments in this specification are described in a progressive manner, with each embodiment focusing on its differences from other embodiments. Similar or identical parts between embodiments can be referred to interchangeably. The methods disclosed in the embodiments are described simply because they correspond to the systems disclosed in the embodiments; relevant details can be found in the method section.

[0108] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementations should not be considered beyond the scope of this invention.

[0109] In the embodiments provided by this invention, it should be understood that the disclosed systems, methods, and approaches can be implemented in other ways. For example, the system embodiments described above are merely illustrative; for instance, the division of units is only a logical functional division, and in actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the coupling or direct coupling or communication connection shown or discussed may be through some interfaces; the indirect coupling or communication connection between systems or units may be electrical, mechanical, or other forms.

[0110] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs.

[0111] In addition, the functional modules in the various embodiments of the present invention can be integrated into one processing unit, or each module can exist physically separately, or two or more modules can be integrated into one unit.

[0112] Similarly, in the various embodiments of the present invention, each processing unit can be integrated into a functional module, or each processing unit can exist physically, or two or more processing units can be integrated into a functional module.

[0113] The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein can be implemented directly by hardware, a software module executed by a processor, or a combination of both. The software module can be located in random access memory (RAM), main memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art.

[0114] Finally, it should be noted that in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes said element.

[0115] The above provides a detailed description of the batch upgrade method, system, apparatus, and readable storage medium for RBD virtual machines provided by this invention. Specific examples have been used to illustrate the principles and implementation methods of this invention. The descriptions of the embodiments above are merely for the purpose of helping to understand the method and core ideas of this invention. It should be noted that those skilled in the art can make various improvements and modifications to this invention without departing from its principles, and these improvements and modifications also fall within the protection scope of the claims of this invention.

Claims

1. A method for avoiding BMC restart during restart testing, characterized in that, include: Power on the AMD server and perform a restart test under the current system; Check if there are any BMC-related anomalies in the current system; If abnormal information is found, the potential signal used to control the BMC restart is pulled as the first reference information, and the system restart information is obtained through CPLD as the second reference information. By comparing the first reference information and the second reference information, it is determined whether the BMC should perform a restart operation. Real-time monitoring of the data flow in the data pipeline between the BMC and BIOS, and control over the acquisition of the data flow according to the first control mechanism; Monitor the in-band communication data stream of BMC in real time and control the in-band communication data stream according to the second control mechanism; The process of checking whether there is any BMC-related anomaly information in the current system includes: Run the main process through BMC, capture the main process, and check if any abnormal monitoring logs are generated; The process of checking the potential signal controlling the BMC restart and determining whether there are abnormal fluctuations in the voltage value of the potential signal, and determining whether the BMC should perform a restart operation by comparing the first reference information and the second reference information, includes: If both the first reference information and the second reference information are signals that control the BMC to perform a restart operation, then the BMC will perform a restart operation.

2. The method for avoiding BMC restart during restart testing according to claim 1, characterized in that, The BMC-related anomaly information includes: Abnormal fluctuations were observed in the abnormal monitoring logs generated by the BMC main process and in the potential signals controlling the BMC restart.

3. The method for avoiding BMC restart during restart testing according to claim 1, characterized in that, The first control mechanism includes: If the acquisition time of the current data stream exceeds the time threshold, it is determined to be an acquisition anomaly and the data stream is deleted, and the next data stream is acquired. If the acquisition time of the next data stream exceeds the time threshold, it is determined to be an acquisition anomaly and the data interaction is stopped for a preset time before the data stream acquisition operation continues. If the number of abnormal acquisitions exceeds the alarm threshold, the data stream acquisition operation will be stopped and a prompt message will be reported.

4. The method for avoiding BMC restart during restart testing according to claim 1, characterized in that, The second control mechanism includes: If the in-band communication data stream of the BMC exceeds the preset value, then the in-band communication of the BMC will be rate-limited. If the data obtained by the BMC's in-band communication is the same twice in a row, the corresponding data will be deleted. If the data acquired by the BMC's in-band communication is the same twenty times in a row, the data transmission speed of the BMC's in-band communication will be reduced.

5. The method for avoiding BMC restart during restart testing according to claim 3, characterized in that, The time threshold is 20 seconds, and the alarm threshold is 10 times.

6. A system that avoids BMC restart during restart testing, characterized in that, include: The test startup module is used to power on the AMD server and perform a restart test under the current system. The information acquisition module is used to check whether there are any BMC-related abnormal information in the current system; The restart determination module is used to pull the potential signal used to control the restart of BMC as the first reference information when there is abnormal information, and at the same time obtain the system restart information through CPLD as the second reference information. By comparing the first reference information and the second reference information, it is determined whether BMC should perform a restart operation. The data transmission control module is used to monitor the data flow of the data pipeline between the BMC and BIOS in real time, and to control the acquisition operation of the data flow according to the first control mechanism. The in-band communication control module is used to monitor the in-band communication data stream of the BMC in real time and control the in-band communication data stream according to the second control mechanism. The process of checking whether there is any BMC-related anomaly information in the current system includes: Run the main process through BMC, capture the main process, and check if any abnormal monitoring logs are generated; The process of checking the potential signal controlling the BMC restart and determining whether there are abnormal fluctuations in the voltage value of the potential signal, and determining whether the BMC should perform a restart operation by comparing the first reference information and the second reference information, includes: If both the first reference information and the second reference information are signals that control the BMC to perform a restart operation, then the BMC will perform a restart operation.

7. A device for preventing BMC restart during a restart test, characterized in that, include: The memory is used to store the program that prevents the BMC from restarting during the restart test; A processor, when executing a program to avoid BMC restart during the restart test, implements the steps of the method for avoiding BMC restart during the restart test as described in any one of claims 1 to 5.

8. A readable storage medium, characterized in that: The readable storage medium stores a program for preventing BMC restart during a restart test, which, when executed by a processor, implements the steps of the method for preventing BMC restart during a restart test as described in any one of claims 1 to 5.