Machine room network equipment intelligent monitoring method and system

By combining the load deviation quotient and port throughput weight of the network equipment in the computer room with frequency mapping matrix and temperature analysis, a fault early warning risk probability is constructed, which solves the problem of the lag in the monitoring mode in the existing technology and realizes the intelligent operation and maintenance of the network equipment in the computer room.

CN122220185APending Publication Date: 2026-06-16STATE GRID HUBEI ELECTRIC POWER INFORMATION & TELECOMMUNICATION COMPANY

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
STATE GRID HUBEI ELECTRIC POWER INFORMATION & TELECOMMUNICATION COMPANY
Filing Date
2026-05-20
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

The existing data center monitoring model relies heavily on manually preset static thresholds and simple alarm triggering mechanisms, which cannot adapt to dynamic fluctuations in network traffic. This results in lagging and one-sided monitoring data, failing to capture instantaneous load anomalies or potential hidden dangers. It also lacks in-depth mining of historical data and trend prediction, limiting the accuracy of fault warnings and leading to low operational and maintenance response efficiency.

Method used

By collecting data on the bandwidth load of network equipment in the computer room, calculating the load deviation quotient and combining it with port throughput weights, adjusting the sampling frequency using a frequency mapping matrix, combining operating temperature analysis and introducing an aging attenuation constant, constructing a fault early warning risk probability, and relying on resource level tags for hierarchical management, we can ensure the targeted and real-time nature of monitoring responses.

🎯Benefits of technology

It enables precise quantification of network equipment pressure in the data center, fine-grained data collection during high-risk periods, and the construction of a forward-looking fault early warning and evaluation system, thereby improving the level of intelligent operation and maintenance and ensuring the real-time and targeted nature of monitoring response.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122220185A_ABST
    Figure CN122220185A_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of intelligent monitoring, in particular to a kind of intelligent monitoring method and system for computer lab network equipment, comprising the following steps: through bandwidth load calculation equipment deviation ratio, matching matrix reduces sampling interval, based on temperature sequence and aging constant early warning failure, extract resource label division abnormal node echelon, matching response parameter issue targeted monitoring, output intelligent monitoring scheme.In the present application, the pressure of computer lab network equipment is accurately quantified by the bandwidth load deviation quotient of computer lab network equipment and the weight fusion operation, the high-risk period data is collected in detail by using frequency mapping matrix and sampling interval reduction, the risk is corrected by combining operation temperature sequence analysis and introducing aging attenuation constant, the risk of traditional threshold trigger mode early warning is effectively avoided, the abnormal nodes are managed hierarchically by relying on resource level label, the response parameter of monitoring is issued with pertinence and real-time, and the intelligent level of computer lab network equipment operation and maintenance is improved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent monitoring technology, and in particular to an intelligent monitoring method and system for computer room network equipment. Background Technology

[0002] The field of intelligent monitoring technology mainly involves the collection, processing, and analysis of data from monitored objects through intelligent means to achieve real-time monitoring and management of specific environments, equipment, or activities. It encompasses monitoring systems and applications based on artificial intelligence, big data analytics, and machine learning, and is widely used in public safety, traffic management, industrial automation, and other fields. Among these, intelligent monitoring methods for data center network equipment refer to the real-time monitoring and management of network equipment within the data center. This involves using sensors, network protocols, and other means to collect data on equipment operating status, temperature, humidity, power consumption, etc., and then analyzing and processing this data to ensure the stable operation of the data center equipment. This typically relies on manual inspections or simple remote monitoring, primarily by setting up monitoring equipment and acquiring relevant data in real time to detect equipment status. In this process, data processing mainly relies on manually set thresholds and simple alarm mechanisms. Once an abnormality or malfunction occurs in the equipment, monitoring personnel receive a warning message through the alarm system and then perform subsequent manual processing.

[0003] Existing data center monitoring methods rely heavily on manually preset static thresholds and simple alarm triggering mechanisms, which cannot adapt to the complex environmental changes brought about by dynamic fluctuations in network traffic. This results in significant lag and bias in monitoring data. Fixed-frequency sampling methods are unable to capture instantaneous load anomalies or potential hazards. Relying solely on single-dimensional status detection ignores the correlation between hardware aging and the operating environment. The lack of in-depth mining of historical data and trend prediction limits the accuracy of fault warnings, makes it impossible to achieve differentiated resource scheduling, and easily leads to low operational and maintenance response efficiency and insufficient critical task assurance capabilities. Summary of the Invention

[0004] The purpose of this invention is to address the shortcomings of existing technologies by proposing an intelligent monitoring method for computer room network equipment.

[0005] To achieve the above objectives, the present invention adopts the following technical solution: an intelligent monitoring method for computer room network equipment, comprising the following steps: S1: Collect the bandwidth load of the network equipment in the computer room, calculate the difference between the load and the preset rated bandwidth load to obtain the load difference, divide the load difference by the preset rated bandwidth load to obtain the load deviation quotient, extract the port throughput setting weight and multiply it with the load deviation quotient to obtain the network equipment deviation ratio data. S2: Extract the preset monitoring frequency mapping matrix, substitute the network device deviation ratio data into the preset monitoring frequency mapping matrix for cross-matching, and combine the preset frequency adjustment baseline to proportionally reduce the sampling interval of the computer room network device to obtain the early warning monitoring sampling frequency data. S3: Collect the discrete time series of the operating temperature of the network equipment in the computer room according to the sampling frequency of the early warning monitoring within the preset time window, calculate the arithmetic mean and arithmetic square root in the discrete time series, divide the arithmetic square root and the mean to obtain the discrete temperature quotient, multiply the discrete temperature quotient with the equipment aging set decay constant, and output the probability of fault early warning risk. S4: For the fault warning risk probability, compare it with the preset risk threshold, filter out the abnormal computer room network equipment nodes that exceed the preset risk threshold, extract the matching preset resource level tags of the corresponding abnormal computer room network equipment nodes, and obtain the abnormal node resource allocation echelon. S5: Match the abnormal node resource allocation echelon with the data center operation and maintenance standard scheme, extract the monitoring response parameters of the corresponding abnormal nodes, and send them to the corresponding control terminal of the abnormal data center network equipment node for targeted monitoring, and output the data center network equipment intelligent monitoring scheme.

[0006] As a further aspect of the present invention, the network device deviation ratio data includes link saturation deviation and node load balancing; the early warning monitoring sampling frequency data includes monitoring clock resolution, sampling density gradient, and time slice width; the fault early warning risk probability includes heat dissipation failure risk and temperature drift fatigue coefficient; the abnormal node resource allocation hierarchy includes abnormal node resource configuration records and fault isolation levels; and the intelligent monitoring scheme for data center network equipment includes intelligent inspection topology and abnormal node status assessment records.

[0007] As a further aspect of the present invention, the step of obtaining the network device deviation ratio data specifically comprises: S111: Obtain the preset rated bandwidth load, capture the bandwidth load of the data center network equipment, and perform a subtraction operation between the bandwidth load of the data center network equipment and the preset rated bandwidth load to generate a load difference; S112: Using the load difference as a variable to be processed and the preset rated bandwidth load as a benchmark divisor, establish a mapping of the load difference relative to the preset rated bandwidth load numerical deviation ratio, and divide the two to calculate the load deviation quotient. S113: Collect port throughput setting weights, establish a linear product logical relationship between the load deviation quotient and the port throughput setting weights, and generate network device deviation ratio data.

[0008] As a further aspect of the present invention, the step of acquiring the early warning monitoring sampling frequency data specifically comprises: S211: Obtain a preset monitoring frequency mapping matrix, perform spatial dimension alignment processing on the network device deviation ratio data, extract the nonlinear feature distribution state inside the network device deviation ratio data, establish a multidimensional cross-matching relationship between the network device deviation ratio data and the preset monitoring frequency mapping matrix, identify the topology distribution set of the mapping processing nodes, and generate a frequency mapping matching vector. S212: Based on the frequency mapping matching vector, obtain the preset frequency adjustment baseline, collect sensor energy consumption threshold, link load saturation, simulated data packet loss rate and sensor real-time power consumption, calculate and obtain the frequency adjustment correction index and establish logical association, and generate frequency change response weight. S213: Statistically reduce the sampling interval of the network equipment sensors in the computer room using the frequency change response weight, construct a dynamic adjustment step mapping structure for the sampling interval, and obtain the early warning monitoring sampling frequency data.

[0009] As a further aspect of the present invention, the formula for calculating the frequency adjustment correction index is as follows: ; in, The frequency adjustment correction index represents the index. Represents the frequency mapping matching vector. This represents the baseline adjustment frequency. Represents link load saturation. Represents the simulated packet loss rate. Represents the sensor's energy consumption threshold. This represents the sensor's real-time power consumption.

[0010] As a further aspect of the present invention, the step of obtaining the fault warning risk probability specifically includes: S311: Obtain a preset time window, set the sampling step according to the early warning monitoring sampling frequency data, collect the discrete time series of the operating temperature of the computer room network equipment within the preset time window, calculate the arithmetic mean and arithmetic square root of the discrete time series of the operating temperature of the computer room network equipment, and generate a sequence distribution statistical vector. S312: Based on the sequence distribution statistical vector, perform a division operation between the arithmetic square root and the arithmetic mean to establish a ratio correlation and obtain the temperature discrete quotient; S313: Obtain the device aging set attenuation constant, establish a linear product relationship between the temperature discrete quotient and the device aging set attenuation constant, perform a weight correction for the lifespan attenuation of the data center network equipment for the temperature discrete quotient, and generate a fault warning risk probability.

[0011] As a further aspect of the present invention, the step of obtaining the abnormal node resource allocation echelon is specifically as follows: S411: Obtain a preset risk boundary, perform a comparison and judgment between the fault warning risk probability and the preset risk boundary, lock the target computer room network device object whose fault warning risk probability value is greater than the preset risk boundary, establish a risk over-limit association mapping, and generate abnormal computer room network device nodes. S412: Extract the device identifier serial number corresponding to the abnormal data center network device node, retrieve the preset resource level label stored in the local database, execute the matching index of the abnormal data center network device node and the preset resource level label configuration item, construct the device identifier and resource weight mapping topology structure, and obtain the node resource level attribute matrix. S413: Based on the node resource level attribute matrix, collect the resource pool allocation topology structure inside the data center, and establish a multi-dimensional resource scheduling and allocation mapping chain for the abnormal data center network device nodes corresponding to different preset resource level labels to obtain the abnormal node resource allocation echelon.

[0012] As a further aspect of the present invention, the steps for obtaining the intelligent monitoring scheme for computer room network equipment are as follows: S511: Obtain the standard solution for data center operation and maintenance, call the abnormal node resource allocation echelon, perform multi-dimensional feature matching between the abnormal node resource allocation echelon and the standard solution for data center operation and maintenance, establish a mapping relationship between gradient level and response level, and generate a set of node monitoring response parameters. S512: Based on the node monitoring response parameter set, monitor the real-time status of the control terminal corresponding to the abnormal computer room network device node, execute configuration instructions remotely and load logic for the control terminal corresponding to the abnormal computer room network device node, establish the mapping logic between monitoring parameters and controlled terminals, form a node-level dynamic monitoring execution chain, and establish a targeted monitoring execution strategy. S513: Based on the targeted monitoring execution strategy, collect the global topology view of the data center network, perform targeted real-time data collection and risk perception for all abnormal data center network device nodes, aggregate the associated monitoring feature information of all abnormal data center network device nodes, form an automated scheduling and control mechanism for the data center network environment, and obtain an intelligent monitoring solution for data center network devices.

[0013] A smart monitoring system for data center network equipment includes: The deviation assessment module collects the bandwidth load of the network equipment in the computer room, calculates the difference between the load and the preset rated bandwidth load to obtain the load difference, divides the load difference by the preset rated bandwidth load to obtain the load deviation quotient, extracts the port throughput set weight and multiplies it with the load deviation quotient to obtain the network equipment deviation ratio data. The monitoring frequency allocation module extracts a preset monitoring frequency mapping matrix, substitutes the network device deviation ratio data into the preset monitoring frequency mapping matrix for cross-matching, and combines the preset frequency adjustment baseline to proportionally reduce the sampling interval of the computer room network devices to obtain early warning monitoring sampling frequency data. The early warning risk calculation module collects the discrete time series of the operating temperature of the network equipment in the computer room according to the early warning monitoring sampling frequency within a preset time window, calculates the arithmetic mean and arithmetic square root in the discrete time series, divides the arithmetic square root and the mean to obtain the temperature discrete quotient, multiplies the temperature discrete quotient with the equipment aging set decay constant, and outputs the probability of fault early warning risk. The resource allocation extraction module compares the probability of the fault warning risk with a preset risk threshold, filters out abnormal data center network device nodes that exceed the preset risk threshold, extracts the matching preset resource level tags of the corresponding abnormal data center network device nodes, and obtains the abnormal node resource allocation tier. The intelligent monitoring output module matches the abnormal node resource allocation hierarchy with the data center operation and maintenance standard solution, extracts the monitoring response parameters of the corresponding abnormal nodes, and sends them to the corresponding control terminal of the abnormal data center network equipment node for targeted monitoring, and outputs the intelligent monitoring solution for the data center network equipment.

[0014] Compared with the prior art, the advantages and positive effects of the present invention are as follows: In this invention, the pressure on data center network equipment is accurately quantified by fusion calculation of bandwidth load deviation quotient and weight. Fine-grained data collection during high-risk periods is achieved by using frequency mapping matrix and sampling interval reduction. Risk correction is carried out by combining operating temperature sequence analysis and introducing aging attenuation constant. This constructs a forward-looking fault early warning and evaluation system, effectively avoiding the risk of missing early warnings in traditional threshold triggering mode. Hierarchical management of abnormal nodes is carried out based on resource level tags to ensure that the distribution of monitoring response parameters is targeted and real-time, thereby improving the intelligent operation and maintenance level of data center network equipment. Attached Figure Description

[0015] Figure 1 This is a flowchart of the main steps of the present invention; Figure 2 This is a flowchart of the network device deviation ratio data acquisition process of the present invention; Figure 3 This is a flowchart illustrating the process of acquiring sampling frequency data for early warning monitoring in this invention. Figure 4 This is a flowchart of the fault warning risk probability acquisition process of the present invention; Figure 5 This is a flowchart of the process for obtaining the resource allocation hierarchy for abnormal nodes in this invention; Figure 6This is a flowchart illustrating the intelligent monitoring solution for computer room network equipment of the present invention. Detailed Implementation

[0016] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.

[0017] In the description of this invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," and "outer," etc., indicating orientation or positional relationships, are based on the orientation or positional relationships shown in the accompanying drawings and are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of the invention. Furthermore, in the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.

[0018] Please see Figure 1 A method for intelligent monitoring of network equipment in a computer room includes the following steps: S1: Collect the bandwidth load of the network equipment in the computer room through sensors, calculate the difference between the load and the preset rated bandwidth load to obtain the load difference, divide the load difference by the preset rated bandwidth load to obtain the load deviation quotient, extract the port throughput setting weight and multiply it with the load deviation quotient to obtain the network equipment deviation ratio data. S2: Extract the preset monitoring frequency mapping matrix, substitute the network device deviation ratio data into the preset monitoring frequency mapping matrix for cross-matching, and combine the preset frequency adjustment baseline to proportionally reduce the sensor sampling interval of the computer room network device to obtain the early warning monitoring sampling frequency data. S3: Collect discrete time series of operating temperature of network equipment in computer room according to the sampling frequency of early warning monitoring within the preset time window, calculate the arithmetic mean and arithmetic square root in the discrete time series, divide the arithmetic square root and the mean to obtain the discrete temperature quotient, multiply the discrete temperature quotient with the equipment aging set decay constant, and output the probability of fault early warning risk. S4: Based on the probability of fault warning risk, compare it with the preset risk threshold, filter out the abnormal data center network equipment nodes that exceed the preset risk threshold, extract the matching preset resource level tags of the corresponding abnormal data center network equipment nodes, and obtain the abnormal node resource allocation tier. S5: Match the resource allocation tiers of abnormal nodes with the standard operation and maintenance scheme of the data center, extract the monitoring response parameters of the corresponding abnormal nodes, and send them to the corresponding control terminal of the abnormal data center network equipment node for targeted monitoring, and output the intelligent monitoring scheme of data center network equipment.

[0019] Network device deviation ratio data includes link saturation deviation and node load balancing; early warning monitoring sampling frequency data includes monitoring clock resolution, sampling density gradient, and time slice width; fault early warning risk probability includes heat dissipation failure risk and temperature drift fatigue coefficient; abnormal node resource allocation hierarchy includes abnormal node resource configuration records and fault isolation levels; intelligent monitoring solution for data center network equipment includes intelligent inspection topology and abnormal node status assessment records.

[0020] Please see Figure 2 Step S1 is as follows: S111: Obtain the preset rated bandwidth load, capture the bandwidth load of the data center network equipment, perform a subtraction operation between the bandwidth load of the data center network equipment and the preset rated bandwidth load, and generate the load difference; To obtain the preset rated bandwidth load, the theoretical maximum transmission rate of each network device's physical port is extracted by accessing the factory technical specifications of the core, aggregation, and access layer switches in the data center, or by using a pre-built asset management system. During this process, differentiated extraction is required for different device models. For example, the preset rated bandwidth load for the 100G optical port of the core switch is set to 100,000 Mbps, while for the gigabit electrical port of the access switch, it is set to 1,000 Mbps. These values ​​are stored in the configuration table of a local cache server. Subsequently, using the Simple Network Management Protocol (SNMP) Get-Request operation, requests are periodically sent to the management information database of the data center network devices, focusing on collecting the total number of inbound bytes and the total number of outbound bytes—two 64-bit counter values. The collection frequency is set to once every 2 seconds. To ensure data accuracy, the collection environment must be established in an independent out-of-band management network to avoid interference from business traffic on the collection link. When capturing the bandwidth load of the data center network devices, the instantaneous throughput is calculated by calculating the byte difference between two consecutive sampling points and combining it with the 2-second sampling interval. To eliminate noise from network micro-burst traffic, the acquired raw data needs to undergo denoising preprocessing. A mean filtering algorithm based on a sliding time window is used, with a window length set to 5 sampling periods (10 seconds in total). By averaging the 5 values ​​within the window, abnormal spikes deviating from the mean by more than 30% are removed. The bandwidth load of the smoothed network equipment in the data center is then subtracted from the previously preset rated bandwidth load. Assuming the preset rated bandwidth load of a core switch port is 100,000 Mbps, and the real-time captured and smoothed bandwidth load is 65,000 Mbps, the load difference is 35,000 Mbps. This difference represents the remaining available bandwidth resources of the device, providing the most basic raw difference data for subsequent load analysis.

[0021] S112: Take the load difference as the variable to be processed, take the preset rated bandwidth load as the benchmark divisor factor, establish the numerical deviation ratio mapping of the load difference relative to the preset rated bandwidth load, and calculate the load deviation quotient by dividing the two. The load difference is treated as a variable to be processed and transmitted to the logic unit via the data bus, while the preset rated bandwidth load is used as a base divisor. Before establishing the mapping relationship, the two data items need to be standardized and aligned to ensure that all values ​​are measured in Mbps. A mapping is established between the load difference and the preset rated bandwidth load value deviation ratio, and the two are divided to obtain the load deviation quotient. For example, if the currently calculated load difference is 35000Mbps, and the preset rated bandwidth load of the device is 100000Mbps, then a division operation of 35000 divided by 100000 is performed, resulting in a load deviation quotient of 0.35. This process essentially converts the absolute bandwidth margin into a relative redundancy ratio. To ensure the reliability of this value, the calculation result is limited to a closed range of 0 to 1. If the calculated load deviation quotient is greater than 1, the input data is considered abnormal, and the S111 acquisition process needs to be retried. The load deviation quotient directly reflects the "ease" of equipment operation: a higher value indicates more bandwidth margin and lower operational risk; conversely, a load deviation quotient below 0.1 means that the equipment's bandwidth utilization has exceeded 90%, indicating heavy load operation. This value not only eliminates the dimensional differences between devices with different physical rate specifications but also provides a unified horizontal comparison benchmark for all network devices, enabling fair scheduling of switches of different specifications under the same monitoring logic.

[0022] S113: Collect port throughput setting weights, establish a logical relationship between load deviation quotient and port throughput setting weights linearly multiplied by the port throughput setting weights, and generate network device deviation ratio data; Port throughput is assigned weights, which are dynamically assigned based on the logical layer of the devices in the data center topology and the importance of the services they carry. In the weight configuration table, the initial weights for core switches, aggregation switches, and access switches are set to 1.8, 1.2, and 0.8, respectively. These weight values ​​are based on a 30-day simulated high-load experiment: the experimental data shows that when the bandwidth utilization of core layer devices fluctuates in the same way, their impact on the overall network stability is 2.25 times greater than that of access layer devices. A linear product logical relationship is established between the load deviation quotient and the port throughput weight assignment. Continuing with the previous example, if the load deviation quotient of a core switch is 0.35 and its corresponding port throughput weight assignment is 1.8, then the multiplication operation of 0.35 multiplied by 1.8 is performed, generating a network device deviation ratio of 0.63.

[0023] Please refer to Table 1 below, which shows the distribution of deviation ratio data for different levels of devices after weight correction: Table 1 Initial Load Parameters for Network Devices: ; As shown in Table 1, linear adjustment of weights can significantly amplify the risk characteristics of core nodes under high load. Nodes with deviation ratios below 0.1 will automatically trigger high-priority monitoring and alerts. By introducing weighting factors, the deviation ratio data is no longer merely a physical reflection of bandwidth, but a comprehensive risk indicator that integrates business importance.

[0024] Please see Figure 3 Step S2 is as follows: S211: Obtain the preset monitoring frequency mapping matrix, perform spatial dimension alignment processing on the network device deviation ratio data, extract the nonlinear feature distribution state inside the network device deviation ratio data, establish a multidimensional cross-matching relationship between the network device deviation ratio data and the preset monitoring frequency mapping matrix, identify the topology distribution set of the mapping processing nodes, and generate a frequency mapping matching vector. A preset monitoring frequency mapping matrix is ​​obtained. This matrix, stored in the configuration database of the data center monitoring platform, is a 10-row, 10-column two-dimensional tensor. The row index of the matrix represents different value ranges of the network device deviation ratio data (with a step size of 0.1, covering 0 to 1.0), and the column index represents the deployment density level of the device in the physical space (levels 1 to 10). Spatial dimension alignment processing is performed on the network device deviation ratio data. Using the coordinate mapping interface of the data center's 3D digital modeling (BIM), the rack number and specific U-position of each abnormal device are obtained, and its physical location is converted into the corresponding spatial density coefficient. The nonlinear characteristic distribution state within the network device deviation ratio data is extracted, and time series volatility analysis based on feature engineering is adopted. Specifically, the second-order difference value of the deviation ratio data in the past 60 minutes is extracted to capture the acceleration characteristics of traffic changes and identify whether there are periodic instantaneous congestion. A multi-dimensional cross-matching relationship between the network device deviation ratio data and the preset monitoring frequency mapping matrix is ​​established. By retrieving the matrix, the calculated deviation ratio data (e.g., 0.63) is cross-located with the corresponding spatial density coefficient (assuming a density level of 6). The topology distribution set of the mapping nodes is identified, and the coordinates of all nodes that meet the matching conditions are aggregated with the current link status to generate a frequency mapping matching vector. This vector is a set of values ​​containing 5 components, for example, represented as... ,in Use the basic frequency reference value (e.g., 60 seconds). This is the spatial correction coefficient. This vector provides multi-dimensional input support for subsequent precise frequency adjustments.

[0025] S212: Based on the frequency mapping matching vector, obtain the preset frequency adjustment baseline, collect sensor energy consumption threshold, link load saturation, simulated packet loss rate, and sensor real-time power consumption, using the formula: ; Calculate and obtain the frequency adjustment correction index and establish logical relationships to generate frequency change response weights; among which, The frequency adjustment correction index represents the index. Represents the frequency mapping matching vector. This represents the baseline adjustment frequency. Represents link load saturation. Represents the simulated packet loss rate. Represents the sensor's energy consumption threshold. This represents the sensor's real-time power consumption; Based on the frequency mapping matching vector, a preset frequency adjustment baseline is obtained. The interval is set to 120 seconds, representing the standard inspection cycle under normal load. Data collection includes sensor power consumption thresholds, link load saturation, simulated packet loss rate, and real-time sensor power consumption. The rated power for heat dissipation is determined by the hardware sensor (such as the DS18B20 temperature probe or current transformer), and is set to 150mW. Link load saturation. The current value is 0.85, obtained by reading the port queue depth. (Simulated packet loss rate) The current value is 0.03, calculated by sending 100 probe packets (64 bytes each) to the target node and counting the percentage of non-responders. (Sensor real-time power consumption) The current value is 110mW, obtained through the power measurement circuit of the embedded management unit. The formula used is: ; Calculate the frequency adjustment correction index. In this operation, the frequency map matches the vector. Take its modulus as a reference term, assuming its value is 0.75. Perform a calculation example using literal logic: First, ... and Multiplying by 90, the absolute value of the difference between 90 and 120 is 30. Next, multiply 30 by the link load saturation of 0.85 and the packet loss rate of 0.03, resulting in 0.765. Then, divide 0.765 by 120 to get 0.006375, and take the square root, resulting in approximately 0.0798. Finally, multiply this square root by the energy efficiency ratio of 150 and divide by 110 (approximately 1.36) to calculate the final frequency adjustment correction index. It is approximately 0.1085. The advantage of this formula is that it reduces the impact of parameter fluctuations on frequency adjustment by using square root operation, ensuring the smoothness of monitoring frequency switching; at the same time, it uses the energy consumption ratio as a gain coefficient. When the real-time power consumption of the sensor is close to the threshold, the correction exponent will increase, thereby extending the sampling interval in subsequent steps through logical association, effectively reducing hardware losses.

[0026] S213: Statistically reduce the sampling interval of the network equipment sensors in the computer room by using the frequency change response weight to proportionally reduce the sampling interval of the network equipment sensors in the computer room, construct a dynamic adjustment step mapping structure for the sampling interval, and obtain the sampling frequency data for early warning monitoring. The sampling interval of the network equipment sensors in the statistical data center is defined as the physical period during which the underlying hardware timer triggers data reporting; the initial value is set to 300 seconds. Frequency change response weights are utilized (these weights are derived from the frequency adjustment correction index calculated previously). After exponential transformation, a value of 0.25 is set to proportionally reduce the sampling interval of the network equipment sensors in the computer room. A dynamic adjustment step mapping structure for the sampling interval is constructed. This structure is a state machine based on a lookup table method, which defines the jump logic of the sampling frequency under different response weights to ensure that the frequency adjustment will not cause buffer overflow of the underlying driver. The sampling frequency data of early warning monitoring is obtained. For example, the original sampling interval of 300 seconds multiplied by (1 minus 0.25) yields a new sampling interval of 225 seconds.

[0027] Please refer to Table 2 below, which shows the test data for dynamically adjusting the sampling frequency under different load conditions: Table 2. Experimental data on dynamic adjustment of sampling frequency: ; As shown in Table 2, when the parameter As the load and packet loss rate increase, the response weight also increases, and the sampling interval is significantly compressed from 300 seconds to 105 seconds. This demonstrates the significant technical advantages of dynamically adjusting the step mapping structure in balancing real-time performance and energy efficiency.

[0028] Please see Figure 4 Step S3 is as follows: S311: Obtain a preset time window, set the sampling step according to the early warning monitoring sampling frequency data, collect the discrete time series of the operating temperature of the computer room network equipment within the preset time window, calculate the arithmetic mean and arithmetic square root of the discrete time series of the operating temperature of the computer room network equipment, and generate a sequence distribution statistical vector. A preset time window is obtained, with its size set to 15 minutes (900 seconds) based on the peak and trough cycle of business operations. The sampling step is set according to the sampling frequency data of the early warning monitoring. If the currently calculated sampling interval is 225 seconds, then 4 sampling actions will be performed within the 15-minute window. The discrete time series of operating temperatures of the data center network equipment is collected within the preset time window. The real-time temperature of each monitoring point on the main control board is obtained by polling through the Baseboard Management Controller (BMC) built into the switch. The data needs to undergo normalization preprocessing, converting the Celsius values ​​into relative scaling values ​​based on the maximum allowable operating temperature of the equipment (e.g., 85℃). For example, the collected original temperature series is [42, 45, 48, 44] (in ℃). The arithmetic mean of the discrete time series of operating temperatures of the data center network equipment is calculated, yielding a mean of 44.75℃. The arithmetic square root, i.e., the standard deviation of the series, is calculated simultaneously, reflecting the degree of temperature fluctuation and dispersion on the time axis. The calculation process is as follows: calculate the sum of squares of the differences between each sampling point and the mean, divide by the number of samplings (4), and then take the square root. The resulting square root is approximately 2.16. Finally, a sequence distribution statistical vector is generated, represented as follows: This vector combines the characteristics of both heat accumulation level and thermal field stability.

[0029] S312: Based on the sequence distribution statistical vector, the ratio relationship is established by dividing the arithmetic square root and the arithmetic mean to obtain the temperature discrete quotient. Based on the sequence distribution statistical vector, the arithmetic square root and arithmetic mean are divided. During this process, it's necessary to determine if the arithmetic mean is within the normal range (20℃ to 70℃). If the mean is abnormally low (e.g., below 10℃), it's considered a sensor malfunction; if normal, a ratio correlation is established to obtain the temperature dispersion quotient. In the example above, dividing the standard deviation of 2.16 by the mean of 44.75 yields a temperature dispersion quotient of 0.0483. This dispersion quotient can accurately identify microscopic heat fluctuations caused by a sudden increase in local ASIC chip load or aging thermal grease, offering improved sensitivity compared to simple absolute temperature threshold alarms.

[0030] S313: Obtain the equipment aging set attenuation constant, establish a linear product relationship between the temperature discrete quotient and the equipment aging set attenuation constant, perform weight correction of the data center network equipment lifespan attenuation for the temperature discrete quotient, and generate fault warning risk probability. The aging attenuation constant for equipment is obtained, and this constant is set in stages based on the number of hours the equipment has been online. By reading the electronic tag data of the equipment, it is found that a core switch has been running for 45,000 hours. According to the experimental calibration curve, the corresponding attenuation constant is set to 0.015. A linear product relationship is established between the temperature discrete quotient and the equipment aging attenuation constant. A weighted adjustment for the lifespan attenuation of network equipment in the data center is performed based on the temperature discrete quotient to generate a fault warning risk probability. The specific calculation logic is: multiply the temperature discrete quotient 0.0483 by the aging attenuation constant 0.015, and combine it with a scaling factor (set to 1000). A textual example illustrates this logic: multiplying 0.0483 by 0.015 yields 0.0007245, then multiplying by 1000, calculates a fault warning risk probability of 0.7245. This probability value directly quantifies the likelihood of hardware failure. By introducing the aging attenuation constant, the same intensity of temperature fluctuations will trigger a higher risk score on older equipment, consistent with the physical fatigue law of hardware.

[0031] Please see Figure 5 Step S4 is as follows: S411: Obtain the preset risk boundary, perform a comparison and judgment between the fault warning risk probability and the preset risk boundary, lock the target computer room network device object whose fault warning risk probability value is greater than the preset risk boundary, establish a risk over-limit association mapping, and generate abnormal computer room network device nodes. A preset risk threshold is obtained, which is set to 0.70 based on the Security Level Agreement (SLA) for data center operations. A comparison is performed between the fault warning risk probability and the preset risk threshold. In this embodiment, the calculated fault warning risk probability of 0.7245 is logically compared with 0.70. Target data center network devices with fault warning risk probabilities exceeding the preset risk threshold are identified, and these devices are determined to be in a high-risk state. A risk exceedance association mapping is established, and the device's management IP address, uplink / downlink ports in the physical topology, and the data center area identifier are structurally encapsulated to generate an abnormal data center network device node. This node serves not only as a warning signal but also as a logical anchor point for subsequent resource scheduling.

[0032] S412: Extract the device identifier serial number corresponding to the abnormal data center network device node, retrieve the preset resource level label stored in the local database, execute the matching index between the abnormal data center network device node and the preset resource level label configuration item, construct the device identifier and resource weight mapping topology structure, and obtain the node resource level attribute matrix; Extract the device identification serial number (SN) corresponding to the abnormal network device node in the data center. Use this unique identifier to retrieve the preset resource level label stored in the local cache database. The labels in the database are divided into core resources (Level 1), aggregation resources (Level 2), and general resources (Level 3). Perform an index matching between the abnormal network device node and the preset resource level label configuration item. If the device identifier corresponding to the SN belongs to the core switch carrying financial transaction traffic, it matches to Level 1. Construct a topology structure mapping device identifiers to resource weights, assigning different processing weights to devices of different levels, for example, setting the weight of Level 1 to 0.98 and the weight of Level 3 to 0.50. Obtain the node resource level attribute matrix. This matrix stores the asset importance ranking of all abnormal nodes in the entire network in the form of a sparse matrix, ensuring that limited operation and maintenance resources can be accurately allocated to the most critical business nodes.

[0033] S413: Based on the node resource level attribute matrix, collect the resource pool allocation topology structure inside the data center, establish a multi-dimensional resource scheduling and allocation mapping chain for abnormal data center network device nodes corresponding to different preset resource level labels, and obtain the abnormal node resource allocation echelon. Based on the node resource level attribute matrix, the resource pool allocation topology within the data center is collected, including the location distribution of idle redundant links, backup power modules, and automated operation and maintenance robots. A multi-dimensional resource scheduling and allocation mapping chain is established for abnormal data center network device nodes corresponding to different preset resource level labels. In the scheduling logic, nodes with a weight value higher than 0.9 are prioritized. For example, when a core switch (weight 0.98) issues an alert, the scheduling chain immediately locks its physically adjacent backup fiber optic path and pre-issues a flow table redirection command in the SDN controller. Edge switches with lower resource weights are assigned to a low-priority observation queue. This process yields the abnormal node resource allocation hierarchy.

[0034] Please see Figure 6 The S5 steps are as follows: S511: Obtain the standard data center operation and maintenance plan, call the abnormal node resource allocation echelon, perform multi-dimensional feature matching between the abnormal node resource allocation echelon and the standard data center operation and maintenance plan, establish a mapping relationship between gradient level and response level, and generate a node monitoring response parameter set. The system acquires standard data center operation and maintenance solutions, which include processing scripts for over 50 preset scenarios such as overheating, packet loss, and intermittent link disconnections. It then invokes the abnormal node resource allocation hierarchy and performs multi-dimensional feature matching between this hierarchy and the standard data center operation and maintenance solutions. Using a semantic matching engine, it extracts feature tags such as "high-risk probability," "high resource weight," and "abnormal temperature dispersion quotient" for abnormal nodes, establishing a mapping relationship between gradient levels and response levels. For example, nodes with a risk probability higher than 0.7 and belonging to level 1 resources are mapped to the "ultra-fast response" level. Finally, it generates a node monitoring response parameter set, containing a set of specific control parameters, such as setting the port speed reduction ratio to 50%, increasing fan speed to 100%, and reducing the monitoring sampling step to 1 second.

[0035] S512: Based on the node monitoring response parameter set, monitor the real-time status of the control terminal corresponding to the abnormal network equipment node in the data center, execute configuration instructions remotely and load logic for the control terminal corresponding to the abnormal network equipment node in the data center, establish the mapping logic between monitoring parameters and controlled terminals, form a node-level dynamic monitoring execution chain, and establish targeted monitoring execution strategies. Based on the node monitoring response parameter set, the system monitors the real-time status of the control terminals corresponding to abnormal network device nodes in the data center and attempts to establish a management session via Secure Shell Protocol (SSH) or Network Configuration Protocol (Netconf). Configuration commands are remotely distributed and logically loaded for the control terminals corresponding to the abnormal network device nodes. Specifically, XML-formatted configuration messages are automatically sent to the target switch, dynamically adjusting its operating parameters. A mapping logic between monitoring parameters and controlled terminals is established, forming a node-level dynamic monitoring and execution chain. For example, upon receiving a speed-reduction command, the underlying chip of the switch immediately adjusts the physical rate negotiation of a specific interface to reduce power consumption and heat generation. Targeted monitoring and execution strategies are established to ensure that each abnormal node receives customized treatment.

[0036] S513: Based on the targeted monitoring execution strategy, collect the global topology view of the data center network, perform targeted real-time data collection and risk perception for all abnormal data center network device nodes, aggregate the associated monitoring feature information of all abnormal data center network device nodes, form an automated scheduling and control mechanism for the data center network environment, and obtain an intelligent monitoring solution for data center network devices. Based on targeted monitoring strategies, a global network topology view of the data center is collected, and the latest snapshot of the entire network's adjacency relationships is obtained using the LLDP protocol. Targeted real-time data collection and risk awareness are performed on all abnormal data center network device nodes, observing the risk probability regression trend after parameter adjustments (such as speed reduction or speed increase). The associated monitoring feature information of all abnormal data center network device nodes is aggregated to form an automated scheduling and control mechanism for the data center network environment. This mechanism dynamically adjusts the operation and maintenance plan based on the processing results through a continuous feedback loop, resulting in an intelligent monitoring solution for data center network devices.

[0037] Please refer to Table 3 below, which shows the comparative test data of this solution after running in a real data center environment for 3 months: Table 3: Actual Measurement Results of Intelligent Monitoring Solution Operation ; As shown in Table 3, by implementing the intelligent monitoring scheme in this embodiment, the data center can reduce its own traffic and computing load while ensuring business continuity. This result demonstrates that by correcting multi-dimensional weights and dynamically adapting sampling frequency, higher risk coverage can be achieved with less resource cost. This successfully establishes an automated scheduling and management mechanism for the data center network environment, providing reliable technical support for the efficient operation and maintenance of large-scale data centers.

[0038] A smart monitoring system for data center network equipment includes: The deviation assessment module collects the bandwidth load of the network equipment in the computer room, calculates the difference between the load and the preset rated bandwidth load to obtain the load difference, divides the load difference by the preset rated bandwidth load to obtain the load deviation quotient, extracts the port throughput set weight and multiplies it with the load deviation quotient to obtain the network equipment deviation ratio data. The monitoring frequency allocation module extracts the preset monitoring frequency mapping matrix, substitutes the network device deviation ratio data into the preset monitoring frequency mapping matrix for cross-matching, and combines the preset frequency adjustment baseline to proportionally reduce the sampling interval of the computer room network devices to obtain the early warning monitoring sampling frequency data. The early warning risk calculation module collects discrete time series of operating temperature of network equipment in the computer room according to the early warning monitoring sampling frequency within a preset time window, calculates the arithmetic mean and arithmetic square root in the discrete time series, divides the arithmetic square root and the mean to obtain the temperature discrete quotient, multiplies the temperature discrete quotient with the equipment aging set decay constant, and outputs the probability of fault early warning risk. The resource allocation extraction module compares the probability of fault warning risk with the preset risk threshold, filters out abnormal data center network equipment nodes that exceed the preset risk threshold, extracts the matching preset resource level tags of the corresponding abnormal data center network equipment nodes, and obtains the abnormal node resource allocation tier. The intelligent monitoring output module matches the resource allocation hierarchy of abnormal nodes with the standard operation and maintenance scheme of the data center, extracts the monitoring response parameters of the corresponding abnormal nodes, and sends them to the corresponding control terminal of the abnormal data center network equipment node for targeted monitoring, and outputs the intelligent monitoring scheme of the data center network equipment.

[0039] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments that can be applied to other fields. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.

Claims

1. A method for intelligent monitoring of network equipment in a computer room, characterized in that, Includes the following steps: S1: Collect the bandwidth load of the network equipment in the computer room, calculate the difference between the load and the preset rated bandwidth load to obtain the load difference, divide the load difference by the preset rated bandwidth load to obtain the load deviation quotient, extract the port throughput setting weight and multiply it with the load deviation quotient to obtain the network equipment deviation ratio data. S2: Extract the preset monitoring frequency mapping matrix, substitute the network device deviation ratio data into the preset monitoring frequency mapping matrix for cross-matching, and combine the preset frequency adjustment baseline to proportionally reduce the sampling interval of the computer room network device to obtain the early warning monitoring sampling frequency data. S3: Collect the discrete time series of the operating temperature of the network equipment in the computer room according to the sampling frequency of the early warning monitoring within the preset time window, calculate the arithmetic mean and arithmetic square root in the discrete time series, divide the arithmetic square root and the mean to obtain the discrete temperature quotient, multiply the discrete temperature quotient with the equipment aging set decay constant, and output the probability of fault early warning risk. S4: Compare the fault warning risk probability with the preset risk threshold, filter out the abnormal computer room network equipment nodes that exceed the preset risk threshold, extract the matching preset resource level tags of the corresponding abnormal computer room network equipment nodes, and obtain the abnormal node resource allocation echelon. S5: Match the abnormal node resource allocation echelon with the data center operation and maintenance standard scheme, extract the monitoring response parameters of the corresponding abnormal nodes, and send them to the corresponding control terminal of the abnormal data center network equipment node for targeted monitoring, and output the data center network equipment intelligent monitoring scheme.

2. The intelligent monitoring method for computer room network equipment according to claim 1, characterized in that: The network device deviation ratio data includes link saturation deviation and node load balancing; the early warning monitoring sampling frequency data includes monitoring clock resolution, sampling density gradient, and time slice width. The fault warning risk probability includes heat dissipation failure risk and temperature drift fatigue coefficient; the abnormal node resource allocation hierarchy includes abnormal node resource configuration records and fault isolation levels; the intelligent monitoring scheme for data center network equipment includes intelligent inspection topology and abnormal node status assessment records.

3. The intelligent monitoring method for computer room network equipment according to claim 1, characterized in that, The specific steps for obtaining the network device deviation ratio data are as follows: S111: Obtain the preset rated bandwidth load, capture the bandwidth load of the data center network equipment, and perform a subtraction operation between the bandwidth load of the data center network equipment and the preset rated bandwidth load to generate a load difference; S112: Using the load difference as a variable to be processed and the preset rated bandwidth load as a benchmark divisor, establish a mapping of the load difference relative to the preset rated bandwidth load numerical deviation ratio, and divide the two to calculate the load deviation quotient. S113: Collect port throughput setting weights, establish a linear product logical relationship between the load deviation quotient and the port throughput setting weights, and generate network device deviation ratio data.

4. The intelligent monitoring method for computer room network equipment according to claim 1, characterized in that, The specific steps for obtaining the sampling frequency data for early warning monitoring are as follows: S211: Obtain a preset monitoring frequency mapping matrix, perform spatial dimension alignment processing on the network device deviation ratio data, extract the nonlinear feature distribution state inside the network device deviation ratio data, establish a multidimensional cross-matching relationship between the network device deviation ratio data and the preset monitoring frequency mapping matrix, identify the topology distribution set of the mapping processing nodes, and generate a frequency mapping matching vector. S212: Based on the frequency mapping matching vector, obtain the preset frequency adjustment baseline, collect sensor energy consumption threshold, link load saturation, simulated data packet loss rate and sensor real-time power consumption, calculate and obtain the frequency adjustment correction index and establish logical association, and generate frequency change response weight. S213: Statistically reduce the sampling interval of the network equipment sensors in the computer room using the frequency change response weight, construct a dynamic adjustment step mapping structure for the sampling interval, and obtain the early warning monitoring sampling frequency data.

5. The intelligent monitoring method for computer room network equipment according to claim 1, characterized in that, The specific steps for obtaining the probability of fault warning risk are as follows: S311: Obtain a preset time window, set the sampling step according to the early warning monitoring sampling frequency data, collect the discrete time series of the operating temperature of the computer room network equipment within the preset time window, calculate the arithmetic mean and arithmetic square root of the discrete time series of the operating temperature of the computer room network equipment, and generate a sequence distribution statistical vector. S312: Based on the sequence distribution statistical vector, perform a division operation between the arithmetic square root and the arithmetic mean to establish a ratio correlation and obtain the temperature discrete quotient; S313: Obtain the device aging set attenuation constant, establish a linear product relationship between the temperature discrete quotient and the device aging set attenuation constant, perform a weight correction for the lifespan attenuation of the data center network equipment for the temperature discrete quotient, and generate a fault warning risk probability.

6. The intelligent monitoring method for computer room network equipment according to claim 1, characterized in that, The specific steps for obtaining the resource allocation hierarchy for abnormal nodes are as follows: S411: Obtain a preset risk boundary, perform a comparison and judgment between the fault warning risk probability and the preset risk boundary, lock the target computer room network device object whose fault warning risk probability value is greater than the preset risk boundary, establish a risk over-limit association mapping, and generate abnormal computer room network device nodes. S412: Extract the device identifier serial number corresponding to the abnormal data center network device node, retrieve the preset resource level label stored in the local database, execute the matching index of the abnormal data center network device node and the preset resource level label configuration item, construct the device identifier and resource weight mapping topology structure, and obtain the node resource level attribute matrix. S413: Based on the node resource level attribute matrix, collect the resource pool allocation topology structure inside the data center, and establish a multi-dimensional resource scheduling and allocation mapping chain for the abnormal data center network device nodes corresponding to different preset resource level labels to obtain the abnormal node resource allocation echelon.

7. The intelligent monitoring method for computer room network equipment according to claim 1, characterized in that, The specific steps for obtaining the intelligent monitoring solution for the computer room network equipment are as follows: S511: Obtain the standard solution for data center operation and maintenance, call the abnormal node resource allocation echelon, perform multi-dimensional feature matching between the abnormal node resource allocation echelon and the standard solution for data center operation and maintenance, establish a mapping relationship between gradient level and response level, and generate a set of node monitoring response parameters. S512: Based on the node monitoring response parameter set, monitor the real-time status of the control terminal corresponding to the abnormal computer room network device node, execute configuration instructions remotely and load logic for the control terminal corresponding to the abnormal computer room network device node, establish the mapping logic between monitoring parameters and controlled terminals, form a node-level dynamic monitoring execution chain, and establish a targeted monitoring execution strategy. S513: Based on the targeted monitoring execution strategy, collect the global topology view of the data center network, perform targeted real-time data collection and risk perception for all abnormal data center network device nodes, aggregate the associated monitoring feature information of all abnormal data center network device nodes, form an automated scheduling and control mechanism for the data center network environment, and obtain an intelligent monitoring solution for data center network devices.

8. An intelligent monitoring system for computer room network equipment, characterized in that, The system is used to implement the intelligent monitoring method for computer room network equipment according to any one of claims 1-7, including: The deviation assessment module collects the bandwidth load of the network equipment in the computer room, calculates the difference between the load and the preset rated bandwidth load to obtain the load difference, divides the load difference by the preset rated bandwidth load to obtain the load deviation quotient, extracts the port throughput set weight and multiplies it with the load deviation quotient to obtain the network equipment deviation ratio data. The monitoring frequency allocation module extracts a preset monitoring frequency mapping matrix, substitutes the network device deviation ratio data into the preset monitoring frequency mapping matrix for cross-matching, and combines the preset frequency adjustment baseline to proportionally reduce the sampling interval of the computer room network devices to obtain early warning monitoring sampling frequency data. The early warning risk calculation module collects the discrete time series of the operating temperature of the network equipment in the computer room according to the early warning monitoring sampling frequency within a preset time window, calculates the arithmetic mean and arithmetic square root in the discrete time series, divides the arithmetic square root and the mean to obtain the temperature discrete quotient, multiplies the temperature discrete quotient with the equipment aging set decay constant, and outputs the probability of fault early warning risk. The resource allocation extraction module compares the probability of the fault warning risk with a preset risk threshold, filters out abnormal data center network device nodes that exceed the preset risk threshold, extracts the matching preset resource level tags of the corresponding abnormal data center network device nodes, and obtains the abnormal node resource allocation tier. The intelligent monitoring output module matches the abnormal node resource allocation hierarchy with the data center operation and maintenance standard solution, extracts the monitoring response parameters of the corresponding abnormal nodes, and sends them to the corresponding control terminal of the abnormal data center network equipment node for targeted monitoring, and outputs the intelligent monitoring solution for the data center network equipment.