A method and device for monitoring out-of-band management functions of a network card
By detecting the operating mode and status of the control chip and network card, and collecting operating data, the problem of the inability to effectively monitor the out-of-band management function of the network card in the existing technology is solved, and accurate monitoring and rapid fault location of the out-of-band management function of the network card are realized.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- INSPUR SUZHOU INTELLIGENT TECH CO LTD
- Filing Date
- 2023-11-09
- Publication Date
- 2026-06-23
Smart Images

Figure CN117312097B_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computers, and more specifically, to a method and apparatus for monitoring the out-of-band management function of a network interface card (NIC). Background Technology
[0002] In the era of network storage and cloud computing, certain scenarios require out-of-band management of host devices. This involves configuring network interface cards (NICs) on the host device to enable out-of-band management. Out-of-band management is achieved through information exchange with the NIC. However, with the increasing demand for NIC out-of-band management, the types of out-of-band interaction information and communication physical channels between the NIC and the host device are also increasing. When invoking the NIC's out-of-band management function, it typically requires a considerable amount of time to study the NIC's out-of-band management support and to build a complex test environment before NIC deployment to monitor the out-of-band management function using relevant machines. This approach is time-consuming and cannot effectively monitor the out-of-band management function. Therefore, how to effectively monitor the invocation status of out-of-band management function has become an urgent problem to be solved. Summary of the Invention
[0003] This application provides a method and apparatus for monitoring the out-of-band management function of a network interface card (NIC), thereby at least solving the problem in related technologies that it is impossible to monitor the out-of-band management function of a NIC.
[0004] According to one embodiment of this application, a method for monitoring the out-of-band management function of a network interface card (NIC) is provided, applied to a host device connected to the NIC. The NIC is also connected to a control chip, which is used to implement the out-of-band management function of the NIC. The method includes: detecting the operating mode and operating status of the control chip and the NIC, wherein the operating mode is used to indicate the current operating stage of the connection between the control chip and the NIC, and the operating status is used to indicate the operating status of the control chip and the NIC under the operating stage; collecting operating data on the NIC that matches both the operating mode and the operating status; and monitoring the out-of-band management function of the NIC by the control chip based on the operating data.
[0005] In an exemplary embodiment, the step of collecting operational data on the network interface card (NIC) that matches the operational mode and the operational state includes: when the operational mode is an adaptation mode, and the operational state is used to indicate that the connection is normal in the adaptation mode, sending a first adaptation query command to the NIC via the NIC driver on the host device, wherein the adaptation mode is used to indicate that the current connection between the control chip and the NIC is in the stage of adapting and debugging the target out-of-band management function between the control chip and the NIC, and the first adaptation query command is used to instruct the NIC to provide feedback on the out-of-band management channels supported by the NIC; and receiving a first set returned by the NIC in response to the first adaptation query command, wherein the operational data... The process includes: a first set, wherein the first set is used to record the out-of-band management channels supported by the network card; detecting whether the first set includes the target out-of-band management channel corresponding to the target out-of-band management function; if the first set includes the target out-of-band management channel, sending a second adaptation query command to the network card through the network card driver on the host device, wherein the second adaptation query command is used to instruct the network card to provide feedback on the out-of-band management protocols and out-of-band management commands supported by the network card; receiving a second set returned by the network card in response to the second adaptation query command, wherein the running data includes the second set, wherein the second set is used to record the out-of-band management protocols and out-of-band management commands supported by the network card.
[0006] Optionally, after detecting whether the first set includes the target out-of-band management channel corresponding to the target out-of-band management function, the method further includes: if the first set does not include the target out-of-band management channel, obtaining the latest version of the firmware of the network card; updating the firmware of the network card with the latest version of the firmware, and then sending the first adaptation query command to the network card again.
[0007] In an exemplary embodiment, the step of collecting operational data on the network interface card (NIC) that matches the operating mode and the operating state includes: when the operating mode is an adaptation mode and the operating state indicates a connection abnormality in the adaptation mode, or when the operating mode is a management mode and the operating state indicates a connection abnormality for the reference out-of-band management function in the management mode, sending a first control command to the NIC via the NIC driver on the host device, wherein the adaptation mode indicates that the current connection between the control chip and the NIC is in the stage of adapting and debugging the target out-of-band management function between the control chip and the NIC, the management mode indicates that the current connection between the control chip and the NIC is in the stage of the control chip performing out-of-band management on the NIC, and the first control command instructs the NIC to feed back a first-level operational log of the NIC or the reference out-of-band management function. The operation logs recording interactive data are divided into multiple levels. The higher the level of the operation log, the greater the amount of information recorded in the interactive data. The multiple levels include the first level. A first log set is received from the network card in response to the first control command, wherein the operation data includes the first log set, and the first log set is used to locate the cause of the connection anomaly. If the first log set fails to locate the cause of the connection anomaly, a second control command is sent to the network card via the network card driver on the host device. The second control command is used to instruct the network card to provide a second level of operation logs from the network card or the reference out-of-band management function. The second level is higher than the first level, and the multiple levels include the second level. A second log set is received from the network card in response to the second control command, wherein the operation data includes the second log set, and the second log set is used to locate the cause of the connection anomaly.
[0008] In one exemplary embodiment, the method further includes: when the operating state is detected to have recovered from connection abnormality to connection normality, sending a third control command to the network card via the network card driver on the host device, wherein the third control command is used to control the network card to recover to a default level for recording operating logs, the default level being one of the plurality of levels, or the default level being no recording of operating logs.
[0009] In an exemplary embodiment, detecting the operating mode between the control chip and the network interface card (NIC) and the operating state within the operating mode includes: detecting the operating mode between the control chip and the NIC, wherein the operating mode includes: an adaptation mode and a management mode, the adaptation mode indicating that the current connection between the control chip and the NIC is in a stage of adapting and debugging a target out-of-band management function between the control chip and the NIC, and the management mode indicating that the current connection between the control chip and the NIC is in a stage where the control chip performs out-of-band management of the NIC; when the operating mode is the adaptation mode, detecting a first connection state between the control chip and the NIC as the operating state; when the operating mode is the management mode, detecting a second connection state of each out-of-band management interface between the control chip and the NIC as the operating state.
[0010] In one exemplary embodiment, the method further includes: sending a parameter query command to the network card via a network card driver on the host device, wherein the parameter query command is used to query out-of-band management parameters of the network card; and receiving target out-of-band management parameters returned by the network card in response to the parameter query command, wherein the target out-of-band management parameters are deployed in the network card firmware installed on the network card.
[0011] According to another embodiment of this application, a monitoring device for out-of-band management function of a network interface card (NIC) is provided, comprising: a detection module for detecting the operating mode between a control chip and the NIC and the operating status within the operating mode, wherein the operating mode indicates the current operating stage of the connection between the control chip and the NIC, and the operating status indicates the operating condition of the control chip and the NIC within the operating stage; a data acquisition module for acquiring operating data on the NIC that matches both the operating mode and the operating status; and a monitoring module for monitoring the out-of-band management function of the NIC by the control chip based on the operating data.
[0012] According to yet another embodiment of this application, a computer-readable storage medium is also provided, wherein a computer program is stored therein, and the computer program is configured to perform the steps in any of the above method embodiments when it is run.
[0013] According to yet another embodiment of this application, an electronic device is also provided, including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.
[0014] This application identifies the operational stage of the connection between the control chip and the network card for out-of-band management by detecting the operating mode between them. Furthermore, it determines the operational status of the control chip and the network card in the current operational stage by detecting the operating status within the operating mode. By collecting operational data from the network card that matches the current operational stage and status, this operational data can be used to effectively monitor the out-of-band management function of the network card. This solves the problem in related technologies where monitoring the out-of-band management function of the network card is impossible, achieving accurate monitoring of the network card's out-of-band management function. Attached Figure Description
[0015] Figure 1 This is a hardware structure block diagram of a server device for a monitoring method of out-of-band management function of a network interface card according to an embodiment of this application;
[0016] Figure 2 This is a flowchart of a method for monitoring the out-of-band management function of a network interface card according to an embodiment of this application;
[0017] Figure 3 This is an optional out-of-band management function adaptation flowchart according to an embodiment of this application;
[0018] Figure 4 This is a schematic diagram of an optional system interaction according to an embodiment of this application;
[0019] Figure 5 This is a structural block diagram of a monitoring device for out-of-band management of network cards according to an embodiment of this application. Detailed Implementation
[0020] The embodiments of this application will be described in detail below with reference to the accompanying drawings and examples.
[0021] It should be noted that the terms "first," "second," etc., in the specification, claims, and drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence.
[0022] The methods and embodiments provided in this application can be executed on a server device or a similar computing device. Taking running on a server device as an example, Figure 1 This is a hardware structure block diagram of a server device for a monitoring method of out-of-band management function of a network interface card (NIC) according to an embodiment of this application. Figure 1 As shown, the server device may include one or more ( Figure 1Only one is shown in the diagram. A processor 102 (which may include, but is not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.) and a memory 104 for storing data are also shown. The server device may further include a transmission device 106 for communication functions and an input / output device 108. Those skilled in the art will understand that... Figure 1 The structure shown is for illustrative purposes only and does not limit the structure of the server equipment described above. For example, the server equipment may also include components that are more... Figure 1 The more or fewer components shown, or having the same Figure 1 The different configurations shown.
[0023] The memory 104 can be used to store computer programs, such as application software programs and modules, like the computer program corresponding to the monitoring method for out-of-band management of the network card in this embodiment. The processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, thus implementing the above-described method. The memory 104 may include high-speed random access memory and non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 104 may further include memory remotely located relative to the processor 102, and these remote memories can be connected to server devices via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
[0024] The transmission device 106 is used to receive or send data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider for the server device. In one example, the transmission device 106 includes a Network Interface Controller (NIC), which can connect to other network devices via a base station to communicate with the Internet. In another example, the transmission device 106 may be a Radio Frequency (RF) module used for wireless communication with the Internet.
[0025] This embodiment provides a method for monitoring the out-of-band management function of a network interface card (NIC). Figure 2 This is a flowchart of a monitoring method for the out-of-band management function of a network interface card (NIC) according to an embodiment of this application, such as... Figure 2 As shown, the process includes the following steps:
[0026] Step S202: Detect the operating mode and operating status of the control chip and the network card, wherein the operating mode is used to indicate the current operating stage of the connection between the control chip and the network card, and the operating status is used to indicate the operating status of the control chip and the network card in the operating stage; Step S204: Collect operating data on the network card that matches both the operating mode and the operating status;
[0027] Step S206: Monitor the out-of-band management function of the control chip for the network card based on the running data.
[0028] Through the above steps, the operating mode between the control chip and the network card is detected to determine the operating stage of the connection between the control chip and the network card for implementing out-of-band management functions. By detecting the operating status in the operating mode, the operating status of the control chip and the network card in the current operating stage is known. Then, by collecting operating data on the network card that matches the current operating stage and operating status, the out-of-band management function of the network card can be effectively monitored. This solves the problem that it is impossible to monitor the out-of-band management function of the network card in related technologies, and achieves the effect of accurate monitoring of the out-of-band management function of the network card.
[0029] In the embodiment provided in step S202 above, the control chip is a chip that communicates with the network card. Through information interaction with the network card, it realizes the out-of-band management function of the host device. The control chip can be, but is not limited to, a BMC (Board Management Controller) chip, an FPGA (Field Programmable Gate Array) chip, etc. This solution does not limit it.
[0030] Optionally, in the embodiments of this application, the operating mode between the control chip and the network card may include, but is not limited to, adaptation mode and management mode. In adaptation mode, the control chip needs to develop and adapt the out-of-band management function of the network card. At this time, the control chip develops and adapts the out-of-band management function of the network card by exchanging information with the network card and establishing a connection relationship. Management mode is the stage of out-of-band management of the network card after the connection relationship between the control chip and the network card has been established.
[0031] Optionally, in this embodiment, the operating mode and operating state of the control chip and network card can be determined, but is not limited to, by analyzing the interaction logs between the control chip and the network card. After receiving data from the control chip, the network card sends the data to the host device for storage. This data is used by the host device to monitor the out-of-band management function of the control chip on the network card. When the host device detects that it needs to monitor the out-of-band management function of the control chip on the network card, the host device identifies the fields representing the current operating mode by identifying key fields in the interaction data reported by the network card device and the control chip. The host device then determines the current operating mode and the operating state of the current operating mode based on subsequent message information (such as whether subsequent messages are received). This solution does not limit this aspect.
[0032] In the embodiment provided in step S204 above, the running data is data used to implement out-of-band management functions with the control chip. The running data may include, but is not limited to, data characterizing the support status of each channel of the out-of-band management function, OEM commands defined by the network card manufacturer, transmission protocols, configuration information used by the transmission protocols, etc. This solution does not limit this.
[0033] Optionally, in this embodiment, the running data related to out-of-band management functions is added to the firmware of the network card. The firmware is a program stored in the network card's Flash memory. After the network card is powered on, the firmware is loaded and run to realize the function of the network card sending and receiving network packets. Then, the host device can quickly understand the basic information of the network card's out-of-band management function by sending a network card out-of-band management function query command to the network card. This query command includes the physical channel medium supported by the current network card out-of-band monitoring function, the transmission protocol (e.g., NCSI protocol), the support status of out-of-band management commands, and the network card's operating status.
[0034] In the embodiment provided in step S206 above, the monitoring of the out-of-band management function of the network card is the operation performed by the control chip to realize the out-of-band management function call during the out-of-band management function call process. For example, when the current operating mode indicates that it is in the adaptation mode, the host device's monitoring operation of the out-of-band management function is to perform the adaptation debugging on the network card that needs to be adapted, so as to realize the establishment of the connection between the control chip and the network card device. Figure 3 This is an optional out-of-band management function adaptation flowchart according to an embodiment of this application, such as... Figure 3 As shown, it includes at least the following steps:
[0035] S301 sends an out-of-band management function query command to the network card to query the out-of-band management channels supported by the current network card firmware and determine the out-of-band management channels supported by the network card.
[0036] S302 checks whether the network card supports the target out-of-band management channel that needs to be adapted. If it supports the target out-of-band management channel, you can further query the transmission protocols and related commands supported by the network card.
[0037] S303. If the current network card firmware does not support the out-of-band management channel or transmission protocol and commands that need to be adapted, the network card firmware needs to be upgraded to configure the out-of-band management channel, transmission protocol and commands. If the latest firmware does not support the out-of-band management channel or transmission protocol and commands that need to be adapted, it can be considered that the network card does not currently support the out-of-band management function that needs to be adapted.
[0038] S304. If the current network card firmware supports the out-of-band management channel, transmission protocol, and related commands that need to be adapted, then the out-of-band management function that needs to be adapted can be adapted and debugged. If a communication failure occurs between the control chip and the network card when adapting the network card's out-of-band management commands, the driver can send an out-of-band management function query command to the network card to query the setting parameters related to the current adapted function (such as querying the network card firmware package ID, EID support status, etc.). At the same time, the output log level of the network card firmware's out-of-band management function is set, and the interaction data information of the current adapted channel of the network card firmware is output to the network card driver and printed out. By analyzing the relevant parameters of the network card's out-of-band management function and the interaction data log of the current adapted channel of the network card firmware, the problem can be analyzed and quickly located.
[0039] Optionally, in this embodiment, the monitoring of the out-of-band management function of the network interface card (NIC) can also involve fault location and maintenance operations for faults occurring during the out-of-band management call phase. This can be achieved by sending commands related to the out-of-band management function to the NIC, such as query commands for NIC out-of-band management function parameters, to obtain the operational data configured on the NIC related to the current out-of-band management function. The fault location for the out-of-band management function call fault can then be determined based on this operational data. Alternatively, a log printing command can be sent to the NIC to obtain the log data stored on the NIC. Analysis of the log data can then help locate the fault. After locating the out-of-band management function call fault, the NIC firmware is upgraded based on the identified fault cause to maintain the fault. Specifically, during the firmware upgrade, the target operational data corresponding to the fault cause can be obtained, and the target operational data can be updated to the corresponding data location in the NIC firmware to complete the NIC firmware update.
[0040] In an exemplary embodiment, the step of collecting operational data on the network interface card (NIC) that matches the operational mode and the operational state includes: when the operational mode is an adaptation mode, and the operational state is used to indicate that the connection is normal in the adaptation mode, sending a first adaptation query command to the NIC via the NIC driver on the host device, wherein the adaptation mode is used to indicate that the current connection between the control chip and the NIC is in the stage of adapting and debugging the target out-of-band management function between the control chip and the NIC, and the first adaptation query command is used to instruct the NIC to provide feedback on the out-of-band management channels supported by the NIC; and receiving a first set returned by the NIC in response to the first adaptation query command, wherein the operational data... The process includes: a first set, wherein the first set is used to record the out-of-band management channels supported by the network card; detecting whether the first set includes the target out-of-band management channel corresponding to the target out-of-band management function; if the first set includes the target out-of-band management channel, sending a second adaptation query command to the network card through the network card driver on the host device, wherein the second adaptation query command is used to instruct the network card to provide feedback on the out-of-band management protocols and out-of-band management commands supported by the network card; receiving a second set returned by the network card in response to the second adaptation query command, wherein the running data includes the second set, wherein the second set is used to record the out-of-band management protocols and out-of-band management commands supported by the network card.
[0041] Optionally, in this embodiment, the out-of-band management function related operation data is stored on the network card, and an out-of-band management function processing module is added to the network card firmware. On the host device, the network card driver can send a network card out-of-band management function query command to the network card to query the physical channel medium, transmission protocol (e.g., NCSI protocol), out-of-band management command support status, and network card operation status supported by the current network card out-of-band monitoring function, etc., thereby realizing the adaptation of the out-of-band management function between the network card and the control chip, as well as the normal connection between the control chip and the network card.
[0042] Based on the above, the network card driver on the host device sends an adaptation query command to the network card, thereby querying the out-of-band management channel, out-of-band management protocol, and out-of-band management command configured in the data network card according to the data requirements of the adaptation mode, thus ensuring the accuracy of the obtained running data.
[0043] In an exemplary embodiment, after detecting whether the first set includes the target out-of-band management channel corresponding to the target out-of-band management function, the method further includes: if the first set does not include the target out-of-band management channel, obtaining the latest version of the firmware of the network card; updating the firmware of the network card with the latest version of the firmware, and then sending the first adaptation query command to the network card again.
[0044] Optionally, in this embodiment of the application, updating the network card firmware using the latest version of the firmware can be done by replacing the old firmware on the network card with the latest version of the firmware, or by extracting the target field related to the target out-of-band management channel from the latest version of the firmware and adding the target field to the old firmware on the network card.
[0045] By following the steps above, even when the firmware does not include the target out-of-band management channel, updating the network card firmware will enable the target firmware to have the relevant configuration for the target out-of-band management channel. This achieves automatic configuration of the target out-of-band management channel supported by the network card, avoiding the need for maintenance personnel to remove the network card from the host and perform related configurations when the network card does not support the target out-of-band management channel, as is required in related technologies. This improves the maintenance efficiency of the out-of-band management channel supported by the network card.
[0046] In an exemplary embodiment, the step of collecting operational data on the network interface card (NIC) that matches the operating mode and the operating state includes: when the operating mode is an adaptation mode and the operating state indicates a connection abnormality in the adaptation mode, or when the operating mode is a management mode and the operating state indicates a connection abnormality for the reference out-of-band management function in the management mode, sending a first control command to the NIC via the NIC driver on the host device, wherein the adaptation mode indicates that the current connection between the control chip and the NIC is in the stage of adapting and debugging the target out-of-band management function between the control chip and the NIC, the management mode indicates that the current connection between the control chip and the NIC is in the stage of the control chip performing out-of-band management on the NIC, and the first control command instructs the NIC to feed back a first-level operational log of the NIC or the reference out-of-band management function. The operation logs recording interactive data are divided into multiple levels. The higher the level of the operation log, the greater the amount of information recorded in the interactive data. The multiple levels include the first level. A first log set is received from the network card in response to the first control command, wherein the operation data includes the first log set, and the first log set is used to locate the cause of the connection anomaly. If the first log set fails to locate the cause of the connection anomaly, a second control command is sent to the network card via the network card driver on the host device. The second control command is used to instruct the network card to provide a second level of operation logs from the network card or the reference out-of-band management function. The second level is higher than the first level, and the multiple levels include the second level. A second log set is received from the network card in response to the second control command, wherein the operation data includes the second log set, and the second log set is used to locate the cause of the connection anomaly.
[0047] Optionally, in this embodiment of the application, the logs used to record data interaction records between the network card and the control chip are maintained in a hierarchical manner, divided into multiple levels, and the interaction data is classified according to data dimensions, and the weights corresponding to different dimensions of data are assigned. Different levels of logs can be used to record data of different data dimensions. When the first control command requests to obtain interaction data of a certain level, the data interaction information of the level below that level in the level sort is retrieved by default. Therefore, the higher the data level obtained, the larger the amount of interaction data obtained.
[0048] Optionally, in this embodiment, the first level can be a default level, such as the lowest level of operation log. When the operation log at this level cannot locate the cause of the connection anomaly, a new level is added. Alternatively, the first level can be the operation log level predicted based on the network card's operating mode and operating status. By predicting the level of the operation log, the extra overhead of data acquisition time caused by sequentially obtaining lower-level operation logs can be avoided. When predicting the log level, the first level that matches the current operating mode and current operating status can be obtained from the corresponding relationship between the operating mode, operating status, and log level.
[0049] Optionally, in the embodiments of this application, the second level can be a log level that is adjacent to and higher than the first level in the log level sequence. Alternatively, it can be a log level predicted based on the location result of the first log set. For example, the location result of the connection anomaly cause location is obtained using the first log set, and the target adjustment step size corresponding to the current location result is obtained from the location result and level adjustment step size with corresponding relationship. The second level is obtained by adding the target adjustment step size to the first level.
[0050] Based on the above, the operation logs storing interactive data on the network card are divided into multiple levels. The higher the level of the operation log, the greater the amount of teaching data information stored. When obtaining interactive data, it is obtained from low to high log levels, thereby avoiding the data transmission load pressure caused by directly obtaining the highest level data.
[0051] In one exemplary embodiment, the method further includes: when the operating state is detected to have recovered from connection abnormality to connection normality, sending a third control command to the network card via the network card driver on the host device, wherein the third control command is used to control the network card to recover to a default level for recording operating logs, the default level being one of the plurality of levels, or the default level being no recording of operating logs.
[0052] By following the steps above, when the connection is restored from an abnormal state to a normal state, the level of recording operation logs is restored to the default level or the level of not recording operation logs is not set. This avoids the impact on the network card's operating load caused by excessive logging information when the connection is normal.
[0053] In an exemplary embodiment, detecting the operating mode between the control chip and the network interface card (NIC) and the operating state within the operating mode includes: detecting the operating mode between the control chip and the NIC, wherein the operating mode includes: an adaptation mode and a management mode, the adaptation mode indicating that the current connection between the control chip and the NIC is in a stage of adapting and debugging a target out-of-band management function between the control chip and the NIC, and the management mode indicating that the current connection between the control chip and the NIC is in a stage where the control chip performs out-of-band management of the NIC; when the operating mode is the adaptation mode, detecting a first connection state between the control chip and the NIC as the operating state; when the operating mode is the management mode, detecting a second connection state of each out-of-band management interface between the control chip and the NIC as the operating state.
[0054] In one exemplary embodiment, the method further includes: sending a parameter query command to the network card via a network card driver on the host device, wherein the parameter query command is used to query out-of-band management parameters of the network card; and receiving target out-of-band management parameters returned by the network card in response to the parameter query command, wherein the target out-of-band management parameters are deployed in the network card firmware installed on the network card.
[0055] This application describes a method for viewing out-of-band management parameters of a network interface card (NIC) and monitoring the operational status of its out-of-band management function via in-band communication (i.e., data interaction between the host device and the NIC). Out-of-band communication refers to the data interaction between the NIC and the control chip, which monitors or manages the host device via out-of-band communication, such as monitoring and managing the operational status of components on the host device. This method involves storing the NIC's out-of-band management parameters in the NIC firmware and adding an out-of-band management function processing module to the firmware. On the host device, the NIC driver can send an out-of-band management function query command to the NIC to query the physical channel medium and transmission protocol (e.g., NCS) supported by the current NIC out-of-band monitoring function. The system provides parameters related to the network card's out-of-band management (OB) functions, such as protocol support, OB command support, and OB operating status, enabling relevant personnel to quickly understand the basic information of the OB functions. When developing and adapting the OB functions for the BMC (Band Control Module), the OB driver can send relevant commands to the OB to set the log collection level for each OB management channel. Based on the set log level, the OB firmware reports the corresponding level of the OB management logs for each channel to the driver. The driver collects and prints the OB management logs. Analyzing the logs allows for rapid location and debugging of OB function problems. For example, setting the log collection level for each OB management channel... At a lower level, the number of out-of-band management commands received by the network card can be simply collected. When the level for collecting the operation logs of each out-of-band management channel of the network card is set to a higher level, the specific data packets sent and received by each out-of-band management channel of the network card can be printed. When the network card experiences probabilistic communication failures between the BMC and the network card during actual use, relevant commands can also be sent to the network card through the network card driver to set the level for collecting the operation logs of each out-of-band management channel of the network card. This allows for the collection of operation logs of each out-of-band management channel of the network card during actual operation, and the analysis of the logs can be used to quickly locate problems with the out-of-band management function of the network card. By default, the network card firmware does not report the operation data of each out-of-band management channel to the driver, and will not have any impact on the normal function of the network card.
[0056] Figure 4 This is an optional system interaction diagram according to an embodiment of this application, such as... Figure 4As shown, the network card is connected to the host device and also to the controller chip (BMC). The controller chip is used to implement the network card's out-of-band management function. When adding out-of-band management function parameters and processing modules to the network card firmware, the network card can support more out-of-band management parameter query functions without changing the current network card hardware. When the network card's out-of-band management function configuration is updated online using tools, the displayed out-of-band management function support status should be changed accordingly when querying the network card's out-of-band management function. The network card firmware should collect all data packets (including erroneous packets) received from the out-of-band management channel and upload the data to the driver to ensure that as much data as possible is provided for troubleshooting. When there is a lot of out-of-band management function interaction data between the BMC and the network card, relevant interaction data should be preserved as much as possible while minimizing the impact on other functions of the network card.
[0057] Add out-of-band management function parameters and an out-of-band management function processing module to the network card firmware. The out-of-band management function parameters will be used to store the configuration parameters of each channel of the network card's out-of-band management, such as the support status of the network card firmware for each channel of out-of-band management, the OEM commands defined by the network card manufacturer, and the configuration information used by each transmission protocol. The out-of-band management function processing module is used to receive and process commands related to out-of-band management functions sent by the network card driver, such as query commands for network card out-of-band management function parameters or settings for the log printing level of network card out-of-band management functions.
[0058] Add out-of-band management function parameters and an out-of-band management function processing module to the network card firmware. The out-of-band management function parameters will be used to store the configuration parameters of each channel of the network card's out-of-band management, such as the support status of the network card firmware for each channel of out-of-band management, the OEM commands defined by the network card manufacturer, and the configuration information used by each transmission protocol. The out-of-band management function processing module is used to receive and process commands related to out-of-band management functions sent by the network card driver, such as query commands for network card out-of-band management function parameters or settings for the log printing level of network card out-of-band management functions.
[0059] When relevant personnel adapt the out-of-band management function of the network card, they can follow these steps:
[0060] 1) First, send a network card out-of-band management function query command on the host device side to query the out-of-band management channels supported by the current network card firmware and determine the out-of-band management channels supported by the network card.
[0061] 2) If the network card supports the out-of-band management channel that needs to be adapted, you can further query the transmission protocols and related commands supported by the network card;
[0062] 3) If the current network card firmware does not support the out-of-band management channel or transmission protocol and commands that need to be adapted, contact the network card staff to obtain the latest firmware and start the adaptation process again from step 1). If the latest firmware does not support the out-of-band management channel or transmission protocol and commands that need to be adapted, it can be assumed that the network card does not currently support the out-of-band management function that needs to be adapted.
[0063] 4) If the current network card firmware supports the out-of-band management channel, transmission protocol and related commands that need to be adapted, then the out-of-band management function that needs to be adapted can be adapted and debugged.
[0064] 5) If a communication failure occurs between the BMC and the network card when adapting out-of-band management commands, an out-of-band management function query command can be sent from the host device to the network card via the driver to query the setting parameters related to the current adaptation function (such as querying the network card firmware package ID, EID support status, etc.). Simultaneously, the out-of-band management function output log level of the network card firmware should be set, and the interaction data information of the current adaptation channel of the network card firmware should be output to the network card driver and printed out. By analyzing the relevant parameters of the network card's out-of-band management function and the interaction data log of the current adaptation channel of the network card firmware, the problem can be analyzed and quickly located.
[0065] When communication failures or other problems that may be related to the out-of-band management function of the network card occur between the BMC and the network card during actual operation, the operation log level of the corresponding out-of-band management channel of the network card can be set to collect the operation log of the out-of-band management channel during the actual operation of the network card without affecting other functions of the network card. The cause of the problem can be identified by analyzing the log. When setting the operation level of the out-of-band management function log of the network card, it can be set from low to high until the log level reaches the highest level or the required log information is obtained.
[0066] Once the issues related to the network card's out-of-band management functions are resolved, restore the default settings for the network card's out-of-band management function logs. The network card firmware will no longer report the operating data of each out-of-band management channel to the driver, ensuring that all network card functions operate normally.
[0067] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product is stored in a storage medium (such as ROM / RAM, magnetic disk, optical disk) and includes several instructions to cause a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the various embodiments of this application.
[0068] This embodiment also provides a monitoring device for out-of-band management of network interface cards (NICs). This device is used to implement the above embodiments and preferred embodiments, and details already described will not be repeated. As used below, the term "module" can refer to a combination of software and / or hardware that implements a predetermined function. Although the device described in the following embodiments is preferably implemented in software, hardware implementation, or a combination of software and hardware, is also possible and contemplated.
[0069] Figure 5 This is a structural block diagram of a monitoring device for out-of-band management of network interface cards according to an embodiment of this application, such as... Figure 5 As shown, the device includes:
[0070] The detection module is used to detect the operating mode between the control chip and the network card and the operating status in the operating mode. The operating mode is used to indicate the current operating stage of the connection between the control chip and the network card, and the operating status is used to indicate the operating status of the control chip and the network card in the operating stage.
[0071] The acquisition module is used to acquire operating data from the network card that matches both the operating mode and the operating status.
[0072] The monitoring module is used to monitor the out-of-band management function of the control chip on the network card based on the operating data.
[0073] Through the above embodiments, the operating mode between the control chip and the network card and the operating status under the operating mode are detected.
[0074] Optionally, the acquisition module includes:
[0075] The first sending unit is configured to send a first adaptation query command to the network card via the network card driver on the host device when the operating mode is adaptation mode and the operating status is configured to indicate that the connection is normal in the adaptation mode. The adaptation mode is configured to indicate that the current connection between the control chip and the network card is in the stage of adapting and debugging the target out-of-band management function between the control chip and the network card. The first adaptation query command is configured to instruct the network card to provide feedback on the out-of-band management channels supported by the network card.
[0076] The first receiving unit is configured to receive the first set returned by the network card in response to the first adaptation query command, wherein the running data includes the first set, and the first set is used to record the out-of-band management channels supported by the network card;
[0077] The detection unit is used to detect whether the first set includes the target out-of-band management channel corresponding to the target out-of-band management function;
[0078] The second sending unit is configured to send a second adaptation query command to the network card via the network card driver on the host device when the target out-of-band management channel is included in the first set. The second adaptation query command is used to instruct the network card to provide feedback on the out-of-band management protocol and out-of-band management commands supported by the network card.
[0079] The second receiving unit receives the second set returned by the network card in response to the second adaptation query command, wherein the running data includes the second set, and the second set is used to record the out-of-band management protocols and out-of-band management commands supported by the network card;
[0080] The acquisition unit is configured to acquire the latest version of the firmware of the network card when the target out-of-band management channel is not included in the first set;
[0081] The third sending unit is used to send the first adaptation query command to the network card again after updating the firmware of the network card with the latest version of the firmware.
[0082] Optionally, the first transmitting unit is configured to:
[0083] When the operating mode is adaptation mode and the operating status indicates a connection abnormality in adaptation mode, or when the operating mode is management mode and the operating status indicates a connection abnormality for the reference out-of-band management function in management mode, a first control command is sent to the network card via the network card driver on the host device. The adaptation mode indicates that the current connection between the control chip and the network card is in the stage of adapting and debugging the target out-of-band management function between the control chip and the network card. The management mode indicates that the current connection between the control chip and the network card is in the stage where the control chip performs out-of-band management on the network card. The first control command instructs the network card to provide a first-level operating log for the network card or the reference out-of-band management function. The network card's operating log for recording interactive data is divided into multiple levels, with higher-level logs recording a greater amount of information about the interactive data. The multiple levels include the first level.
[0084] Optionally, the first receiving unit is configured to:
[0085] The system receives a first log set returned by the network card in response to the first control command, wherein the running data includes the first log set, and the first log set is used to locate the cause of the connection abnormality.
[0086] Optionally, the second transmitting unit is configured to:
[0087] If the first log set fails to locate the cause of the connection anomaly, a second control command is sent to the network card via the network card driver on the host device. The second control command is used to instruct the network card to provide the network card or the reference out-of-band management function's second-level operation log. The second level is higher than the first level, and the plurality of levels includes the second level.
[0088] Optionally, the second receiving unit is configured to:
[0089] The system receives a second log set returned by the network card in response to the second control command, wherein the running data includes the second log set, and the second log set is used to locate the cause of the connection abnormality.
[0090] Optionally, the third transmitting unit is configured to:
[0091] When the operating state is detected to have recovered from connection abnormality to connection normality, a third control command is sent to the network card through the network card driver on the host device. The third control command is used to control the network card to restore to the default level of recording operation logs. The default level is one of the multiple levels, or the default level is not to record operation logs.
[0092] Optionally, the detection module includes:
[0093] The first detection unit is used to detect the operating mode between the control chip and the network card, wherein the operating mode includes: adaptation mode and management mode. The adaptation mode is used to indicate that the current connection between the control chip and the network card is in the stage of adapting and debugging the target out-of-band management function between the control chip and the network card. The management mode is used to indicate that the current connection between the control chip and the network card is in the stage of the control chip performing out-of-band management of the network card.
[0094] The second detection unit is used to detect the first connection state between the control chip and the network card as the operating state when the operating mode is the adaptation mode.
[0095] The third detection unit is used to detect the second connection status of each out-of-band management interface between the control chip and the network card as the operating status when the operating mode is the management mode.
[0096] Optionally, the device further includes:
[0097] The sending module is used to send a parameter query command to the network card through the network card driver on the host device, wherein the parameter query command is used to query the out-of-band management parameters of the network card;
[0098] The receiving module is used to receive the target out-of-band management parameters returned by the network card in response to the parameter query command, wherein the target out-of-band management parameters are deployed in the network card firmware installed on the network card.
[0099] It should be noted that the above modules can be implemented by software or hardware. For the latter, they can be implemented in the following ways, but are not limited to: all the above modules are located in the same processor; or, the above modules are located in different processors in any combination.
[0100] Embodiments of this application also provide a computer-readable storage medium storing a computer program, wherein the computer program is configured to execute the steps in any of the above method embodiments when run.
[0101] In one exemplary embodiment, the aforementioned computer-readable storage medium may include, but is not limited to, various media capable of storing computer programs, such as a USB flash drive, read-only memory (ROM), random access memory (RAM), portable hard disk, magnetic disk, or optical disk.
[0102] Embodiments of this application also provide an electronic device, including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.
[0103] In one exemplary embodiment, the electronic device may further include a transmission device and an input / output device, wherein the transmission device is connected to the processor and the input / output device is connected to the processor.
[0104] Specific examples in this embodiment can be found in the examples described in the above embodiments and exemplary implementations, and will not be repeated here.
[0105] Obviously, those skilled in the art should understand that the modules or steps of this application described above can be implemented using general-purpose computing devices. They can be centralized on a single computing device or distributed across a network of multiple computing devices. They can be implemented using computer-executable program code, and thus can be stored in a storage device for execution by a computing device. In some cases, the steps shown or described can be performed in a different order than those presented here, or they can be fabricated as separate integrated circuit modules, or multiple modules or steps can be fabricated as a single integrated circuit module. Thus, this application is not limited to any particular combination of hardware and software.
[0106] The above description is merely a preferred embodiment of this application and is not intended to limit this application. Various modifications and variations can be made to this application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc., made within the principles of this application should be included within the protection scope of this application.
Claims
1. A method for monitoring the out-of-band management function of a network interface card (NIC), characterized in that, A host device used for connecting a network interface card (NIC), wherein the NIC is also connected to a control chip, the control chip being used to implement out-of-band management functions of the NIC, the method comprising: The system detects the operating mode and operating status between the control chip and the network card, wherein the operating mode is used to indicate the current operating stage of the connection between the control chip and the network card, and the operating status is used to indicate the operating condition of the control chip and the network card in the operating stage. Collect operating data from the network card that matches both the operating mode and the operating status; The control chip's out-of-band management function for the network card is monitored based on the operational data. The step of collecting operational data from the network interface card (NIC) that matches the operating mode and the operating status includes: when the operating mode is adaptation mode and the operating status indicates a connection abnormality in adaptation mode, or when the operating mode is management mode and the operating status indicates a connection abnormality for the reference out-of-band management function in management mode, sending a first control command to the NIC via the NIC driver on the host device. The adaptation mode indicates that the current connection between the control chip and the NIC is in the stage of adapting and debugging the target out-of-band management function between the control chip and the NIC; the management mode indicates that the current connection between the control chip and the NIC is in the stage of the control chip performing out-of-band management on the NIC; the first control command instructs the NIC to provide a first-level operational log for the NIC or the reference out-of-band management function; and the NIC records interactive data. The recorded operation logs are divided into multiple levels, with higher-level logs recording a greater amount of interactive data. The multiple levels include the first level. The system receives a first log set returned by the network card in response to the first control command, wherein the operation data includes the first log set, which is used to locate the cause of the connection anomaly. If the first log set fails to locate the cause of the connection anomaly, a second control command is sent to the network card via the network card driver on the host device. The second control command instructs the network card to provide a second-level operation log from the network card or the reference out-of-band management function. The second level is higher than the first level, and the multiple levels include the second level. The system also receives a second log set returned by the network card in response to the second control command, wherein the operation data includes the second log set, which is used to locate the cause of the connection anomaly.
2. The method according to claim 1, characterized in that, The process of collecting operational data from the network interface card that matches the operational mode and operational status includes: When the operating mode is adaptation mode, the operating status is used to indicate that when the connection is normal in adaptation mode, a first adaptation query command is sent to the network card through the network card driver on the host device. The adaptation mode is used to indicate that the current connection between the control chip and the network card is in the stage of adapting and debugging the target out-of-band management function between the control chip and the network card. The first adaptation query command is used to instruct the network card to provide feedback on the out-of-band management channels supported by the network card. The network interface card (NIC) receives a first set returned by the first adaptation query command, wherein the running data includes the first set, and the first set is used to record the out-of-band management channels supported by the NIC. Detect whether the first set includes the target out-of-band management channel corresponding to the target out-of-band management function; If the target out-of-band management channel is included in the first set, a second adaptation query command is sent to the network card through the network card driver on the host device. The second adaptation query command is used to instruct the network card to provide feedback on the out-of-band management protocols and out-of-band management commands supported by the network card. The system receives a second set returned by the network card in response to the second adaptation query command, wherein the running data includes the second set, and the second set is used to record the out-of-band management protocols and out-of-band management commands supported by the network card.
3. The method according to claim 2, characterized in that, After detecting whether the first set includes the target out-of-band management channel corresponding to the target out-of-band management function, the method further includes: If the target out-of-band management channel is not included in the first set, obtain the latest version of the firmware of the network card; After updating the network card's firmware using the latest version of the firmware, the first adaptation query command is sent to the network card again.
4. The method according to claim 1, characterized in that, The method further includes: When the operating state is detected to have recovered from connection abnormality to connection normality, a third control command is sent to the network card through the network card driver on the host device. The third control command is used to control the network card to restore to the default level of recording operation logs. The default level is one of the multiple levels, or the default level is not to record operation logs.
5. The method according to claim 1, characterized in that, The detection of the operating mode between the control chip and the network card, and the operating status under the operating mode, includes: The operating mode between the control chip and the network card is detected, wherein the operating mode includes: adaptation mode and management mode. The adaptation mode is used to indicate that the current connection between the control chip and the network card is in the stage of adapting and debugging the target out-of-band management function between the control chip and the network card. The management mode is used to indicate that the current connection between the control chip and the network card is in the stage of the control chip performing out-of-band management of the network card. When the operating mode is the adaptation mode, the first connection status between the control chip and the network card is detected as the operating status. When the operating mode is the management mode, the second connection status of each out-of-band management interface between the control chip and the network card is detected as the operating status.
6. The method according to claim 1, characterized in that, The method further includes: The host device sends a parameter query command to the network card via the network card driver, wherein the parameter query command is used to query the out-of-band management parameters of the network card; The network interface card (NIC) receives the target out-of-band management parameters returned by the parameter query command in response to the NIC, wherein the target out-of-band management parameters are deployed in the NIC firmware installed on the NIC.
7. A monitoring device with out-of-band management function for a network interface card (NIC), characterized in that, include: The detection module is used to detect the operating mode between the control chip and the network card and the operating status in the operating mode. The operating mode is used to indicate the current operating stage of the connection between the control chip and the network card, and the operating status is used to indicate the operating status of the control chip and the network card in the operating stage. The acquisition module is used to acquire operating data from the network card that matches both the operating mode and the operating status. The monitoring module is used to monitor the out-of-band management function of the control chip on the network card based on the operating data. The first sending unit is configured to: send a first control command to the network card via the network card driver on the host device when the operating mode is adaptation mode and the operating status indicates a connection abnormality in the adaptation mode, or when the operating mode is management mode and the operating status indicates a connection abnormality of the reference out-of-band management function in the management mode. The adaptation mode indicates that the current connection between the control chip and the network card is in the stage of adapting and debugging the target out-of-band management function between the control chip and the network card. The management mode indicates that the current connection between the control chip and the network card is in the stage of the control chip performing out-of-band management on the network card. The first control command instructs the network card to provide a first-level operating log for the network card or the reference out-of-band management function. The operating log for the network card recording interactive data is divided into multiple levels, with higher-level operating logs recording a larger amount of information about the interactive data. The multiple levels include the first level. The first receiving unit is configured to: receive a first log set returned by the network card in response to the first control command, wherein the running data includes the first log set, and the first log set is used to locate the cause of the connection abnormality; The second sending unit is configured to: send a second control command to the network card via the network card driver on the host device when the first log set fails to locate the cause of the connection anomaly, wherein the second control command is configured to instruct the network card to provide the network card with the second level of operation logs of the network card or the reference out-of-band management function, wherein the second level is higher than the first level, and the plurality of levels includes the second level; The second receiving unit is configured to: receive a second log set returned by the network card in response to the second control command, wherein the running data includes the second log set, and the second log set is used to locate the cause of the connection abnormality.
8. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the steps of the method described in any one of claims 1 to 6.
9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the processor executes the computer program, it implements the steps of the method described in any one of claims 1 to 6.