A method for identifying server malfunctions

By acquiring voltage and current waveforms under server load step commands, analyzing transient response characteristics, and performing time-frequency signal decomposition, the problem of difficulty in identifying server hardware faults in existing technologies is solved, enabling accurate assessment and early warning of the health status of the power supply circuit.

CN122309205APending Publication Date: 2026-06-30百信信息技术有限公司

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
百信信息技术有限公司
Filing Date
2026-01-23
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing server anomaly monitoring technologies struggle to identify microsecond-level voltage fluctuations and current surges, leading to hardware failures going undetected in a timely manner. Furthermore, they are prone to false alarms when the power grid environment fluctuates, making it difficult to accurately pinpoint the source of the fault.

Method used

By collecting instantaneous voltage and current waveforms during the transient process of the server executing a load step command, analyzing the electrical characteristics within the transient response time window, and combining time-frequency signal decomposition and spectral correlation analysis to eliminate common-mode interference, it is possible to determine whether there are physical operational abnormalities in the server's internal hardware.

Benefits of technology

It enables the assessment of the dynamic health of the power supply circuit, allowing for early identification of hardware faults, reducing false alarm rates, and improving monitoring accuracy in fluctuating power grid environments.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309205A_ABST
    Figure CN122309205A_ABST
Patent Text Reader

Abstract

This invention discloses a method for identifying server operational anomalies, relating to the field of electrical fault diagnosis technology. It addresses the problems of difficulty in detecting dynamic performance degradation of power supplies and high false alarm rates due to environmental noise. First, instantaneous voltage and current waveforms are synchronously acquired during load step transients. The transient response window is defined based on the current ramp-up slope, and the voltage drop amplitude is extracted. Then, the response window is aligned with the command trigger moment in the time domain, the lag time is calculated, and correlation analysis is performed with the voltage drop amplitude to accurately determine dynamic adjustment anomalies in the power supply circuit. If the dynamic adjustment is normal, the fundamental frequency of the instantaneous current waveform is removed, and a time-frequency decomposition algorithm is applied to calculate the energy spectral density and spectral entropy values ​​of each independent frequency band, constructing a frequency domain feature vector. Finally, the spectral correlation of adjacent nodes on the same busbar is used to separate common-mode interference from the external power grid, and internal hardware physical anomalies are determined only based on the remaining differential-mode components, achieving accurate identification of server power supply health and hardware faults.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of electrical fault diagnosis technology, specifically a method for identifying abnormal server operation. Background Technology

[0002] With the widespread deployment of digital infrastructure, server clusters have become the core carriers supporting data computing and storage. The operational stability of their hardware systems directly affects business continuity and data security. Under long-term high-load operating environments, the power supply units, motherboard voltage regulator modules, and various active components inside the server inevitably experience performance degradation. Once these underlying hardware components develop hidden faults that go undetected, they can easily trigger sudden system crashes or data corruption. Therefore, accurate monitoring of server power quality and circuit health is crucial to ensuring high system availability.

[0003] However, most existing server anomaly monitoring technologies rely on operating system-level log analysis or simple steady-state voltage and current readings. These methods have significant limitations when facing deep hardware faults: on the one hand, average-based steady-state monitoring masks microsecond-level voltage fluctuations and current surges, making it difficult to detect the degradation of the power supply circuit's dynamic regulation capability, resulting in the failure to identify "sub-healthy" states such as capacitor aging and dynamic response lag; on the other hand, existing monitoring methods struggle to effectively distinguish between fault noise generated inside the server and environmental interference introduced by the external power grid. When there are fluctuations in the power supply environment, false alarms are easily generated, making it impossible to accurately pinpoint the source of the fault and failing to meet the stringent requirements of early warning of hardware faults in high-precision computing environments. Summary of the Invention

[0004] To address the shortcomings of existing technologies, this invention provides a method for identifying server malfunctions, thus resolving the problems described in the background section.

[0005] To achieve the above objectives, the present invention provides the following technical solution: A method for identifying server malfunctions, comprising the following steps: S1. During the transient process of the server executing a load step command, instantaneous voltage waveform and instantaneous current waveform are acquired by a power acquisition unit connected to the power input circuit. The transient response time window before and after the load change is defined based on the ramp slope of the instantaneous current waveform, and the voltage drop amplitude within this time window is extracted as an electrical response feature; S2. The transient response time window is time-domain aligned with the triggering time of the load step command. The lag time from the command triggering time to the moment when the instantaneous current waveform first reaches the steady-state threshold is calculated, and the lag time is correlated with the voltage drop amplitude. When the lag time or the voltage drop amplitude exceeds a preset dynamic damping range... S3. If the dynamic adjustment of the power supply circuit is normal, the instantaneous current waveform is processed to remove the fundamental wave to retain the high-frequency conducted interference component. The high-frequency conducted interference component is divided into multiple independent frequency band subspaces by applying a time-frequency signal decomposition algorithm, and the signal energy spectral density and spectral entropy value in each frequency band subspace are calculated to construct a frequency domain feature vector characterizing the current electromagnetic noise characteristics of the server. S4. When the frequency domain feature vector shows abnormal fluctuations in spectral entropy value, the reference frequency domain feature vector of the adjacent node on the same power supply bus is read. By calculating the correlation between the frequency domain feature vector and the reference frequency domain feature vector in the corresponding frequency band, the common-mode interference component introduced by the external power grid is separated. The server internal hardware is judged to have physical operation abnormalities based only on the remaining differential-mode interference component.

[0006] Furthermore, the specific process of defining the transient response time window before and after a load change based on the ramp-up slope of the instantaneous current waveform, and extracting the voltage drop amplitude within this time window as an electrical response feature, is as follows: The instantaneous current waveform data acquired by the power acquisition unit is processed by first-order differentiation to generate a current rate of change sequence. This sequence is scanned, and the moment when the rate of change amplitude first exceeds a preset trigger threshold is identified as the start time of the transient response. Starting from this moment, the decay trend of the current rate of change sequence is continuously monitored. When the rate of change amplitude continuously converges within a preset steady-state noise tolerance band, this moment is marked as the end time of the transient response, thus defining a closed transient response time window. The arithmetic mean of the instantaneous voltage waveforms before the start time of the transient response is used as the reference voltage. The instantaneous voltage waveform data within the transient response time window is traversed, and the local minimum point of the voltage amplitude is searched and identified. The difference between the reference voltage and the local minimum point is calculated, and this difference is defined as the voltage drop amplitude.

[0007] Furthermore, the specific process of aligning the transient response time window with the triggering time of the load step instruction in the time domain and calculating the lag time from the instruction triggering time to the moment when the instantaneous current waveform first reaches the steady-state threshold is as follows: The hardware interrupt timestamp generated by the load step instruction in the operating system kernel is obtained as the instruction triggering time. This timestamp is mapped to the sampling clock axis of the power acquisition unit through a hardware synchronization signal, thus unifying the time domain reference. In the time-domain aligned instantaneous current waveform, a preset percentage of the target load current value is set as the rising edge decision threshold. A moving average filter is applied to the instantaneous current waveform to eliminate high-frequency glitches. The first intersection point between the filtered waveform and the rising edge decision threshold is searched along the time axis in the forward direction. The time difference between the instruction triggering time and this intersection point is calculated, and after deducting the fixed signal transmission delay and sensor response delay from this time difference, the lag time characterizing the power supply loop bandwidth is obtained.

[0008] Furthermore, a correlation analysis is performed between the hysteresis duration and the voltage drop amplitude. When the hysteresis duration or voltage drop amplitude exceeds the preset dynamic damping range, the specific process for determining that there is a dynamic adjustment anomaly in the power supply circuit is as follows: Construct a two-dimensional dynamic response evaluation plane with the hysteresis duration as the horizontal axis and the voltage drop amplitude as the vertical axis, and map the currently calculated hysteresis duration and voltage drop amplitude to a measured state coordinate point on the plane; retrieve the standard damping characteristic curve of the server at the time of manufacture as a safety boundary reference, calculate the normal deviation vector of the measured state coordinate point relative to the standard damping characteristic curve, and obtain the magnitude and direction angle of the deviation vector; when the magnitude of the deviation vector is greater than the preset fault tolerance radius, determine the anomaly type according to the quadrant in which the direction angle is located. If the direction angle points to the high hysteresis region, it is determined that the phase margin is insufficient due to the aging of the power supply loop compensation capacitor. If the direction angle points to the high drop region, it is determined that the impedance mismatch is caused by the increase in the equivalent series resistance of the power supply output path, confirming that there is a dynamic adjustment anomaly in the power supply circuit.

[0009] Furthermore, if the dynamic adjustment of the power supply circuit is determined to be normal, the instantaneous current waveform is subjected to fundamental wave removal processing to retain the high-frequency conducted interference component. The specific process of dividing the high-frequency conducted interference component into multiple independent frequency band subspaces using the time-frequency signal decomposition algorithm is as follows: By tracking the power frequency fundamental component in the instantaneous current waveform in real time, a standard sinusoidal reference signal with the same frequency and phase as the fundamental component is constructed. The instantaneous current waveform and the standard sinusoidal reference signal are differentially processed to filter out the large-amplitude power frequency energy and extract the small-amplitude residual signal as the original conducted interference component. A mother wavelet function with orthogonal compact support characteristics is selected to perform multi-level wavelet packet decomposition on the original conducted interference component, generating a decomposition tree structure containing low-frequency approximation coefficients and high-frequency detail coefficients. The coefficients of each node at the end of the decomposition tree are reconstructed by a single branch, decoupling the broadband interference signal in the time domain into non-overlapping narrow-band time domain signals covering different center frequencies, and defining each narrow-band time domain signal as an independent frequency band subspace.

[0010] Furthermore, the specific process of constructing a frequency domain feature vector characterizing the current electromagnetic noise characteristics of the server by calculating the signal energy spectral density and spectral entropy value in each frequency band subspace is as follows: A windowed Fourier transform is performed on the narrowband time-domain signal in each frequency band subspace to obtain the power distribution of that frequency band within a unit bandwidth. The root mean square value of that frequency band is calculated by integration to obtain the total signal energy of the subspace. The total signal energy of the frequency band subspace is normalized, and the negative logarithmic expectation of the probability density function of the energy distribution is calculated to obtain the spectral entropy value characterizing the degree of signal disorder in that frequency band. The total signal energy and spectral entropy value of each frequency band subspace are concatenated in order of frequency from low to high to construct a multi-dimensional frequency domain feature vector, where each dimension of the vector corresponds to the electromagnetic noise intensity and complexity within a specific frequency range.

[0011] Furthermore, when the frequency domain feature vector shows abnormal fluctuations in spectral entropy, the reference frequency domain feature vectors of adjacent nodes on the same power supply bus are read. The specific process of calculating the correlation between the frequency domain feature vector and the reference frequency domain feature vector in the corresponding frequency band is as follows: a sensitivity threshold for spectral entropy change is set, and the temporal variance of the high-frequency band spectral entropy value in the frequency domain feature vector is monitored in real time. When the variance changes abruptly and exceeds the sensitivity threshold, a collaborative verification mechanism is triggered. Through the infrastructure management interface of the data center, other servers physically connected to the same power distribution unit are identified as neighboring nodes, and the reference frequency domain feature vectors generated by the neighboring nodes at the same time are requested and synchronously obtained. The energy trend similarity between the frequency domain feature vector and the reference frequency domain feature vector in each corresponding frequency band subspace is calculated using the Pearson correlation coefficient algorithm, generating a spectral correlation sequence composed of multiple correlation coefficients.

[0012] Furthermore, the specific process of separating the common-mode interference component introduced by the external power grid and determining whether there is a physical malfunction in the server's internal hardware based solely on the remaining differential-mode interference component is as follows: Traverse the spectrum correlation sequence, filter out frequency bands with correlation coefficients higher than the preset common-mode judgment value, mark them as common-mode interference frequency bands affected by background noise from the external power grid, and perform weighted suppression or zeroing on the abnormal energy in these frequency bands; retain frequency bands with correlation coefficients lower than the common-mode judgment value as differential-mode interference frequency bands, extract the residual energy features and spectral entropy features in the differential-mode interference frequency bands, input the extracted residual features into a preset hardware fault fingerprint database for matching, if there are non-Gaussian distributed burst high-energy pulses in the high-frequency band, it is determined to be a breakdown or aging of the internal components of the switching power supply module, if there is periodic modulation noise in the low-frequency band, it is determined to be an electromagnetic feedback abnormality generated by the cooling fan motor, confirming that there is a physical malfunction in the server's internal hardware.

[0013] The present invention has the following beneficial effects:

[0014] (1) A method for identifying server malfunctions, which can effectively assess the dynamic health of the power supply circuit and solve the problem that static monitoring cannot detect device aging. By synchronously acquiring instantaneous voltage and current waveforms during the transient process of load step, this method breaks through the blind spot of traditional steady-state monitoring. By using the current ramp-up slope to define the transient window and deeply analyzing the time-domain lag relationship between command triggering and current response and voltage drop characteristics, a quantitative assessment of the dynamic damping characteristics of the power supply circuit is achieved. This active excitation combined with passive response detection mechanism can sensitively capture dynamic adjustment anomalies caused by the increase of the equivalent series resistance of the filter capacitor or the insufficient phase margin of the voltage regulator module, thereby achieving early warning before the hardware completely fails.

[0015] (2) A method for identifying server malfunctions. Through fundamental frequency removal processing and time-frequency signal decomposition, this method can extract weak high-frequency conducted interference components from a strong power background and use spectral entropy values ​​to characterize the complexity of electromagnetic noise. More importantly, by introducing a topology comparison mechanism of adjacent nodes on the same busbar, common-mode interference introduced by the external power grid is eliminated using spectral correlation analysis, retaining only differential-mode interference components that reflect internal hardware faults. This dual mechanism of frequency domain feature extraction combined with spatial interference decoupling ensures that even under power grid quality fluctuations, the physical anomalies inside the server can still be accurately located.

[0016] Of course, any product implementing this invention does not necessarily need to achieve all of the advantages described above at the same time. Attached Figure Description

[0017] Figure 1 This is a flowchart of a server operation anomaly identification method according to the present invention. Detailed Implementation

[0018] This application embodiment solves the technical problems of high false alarm rate when facing power dynamic performance degradation and high false alarm rate when the power grid environment fluctuates by providing a server operation anomaly identification method.

[0019] The overall approach of the scheme in this application is as follows: It abandons the single steady-state indicator monitoring and shifts to in-depth analysis of transient waveforms and high-frequency noise of electrical variables. First, load mutations are used as physical probes to diagnose the dynamic regulation capability of the power supply system by measuring the transient response and time delay of voltage and current. Second, based on the benchmark of normal dynamic regulation, signal processing technology is further used to extract high-frequency noise fingerprints from the power lines, and environmental background noise is filtered out through lateral comparison with neighboring nodes, ultimately achieving a precise profile and anomaly identification of the physical operating status of the server's internal hardware.

[0020] Please see Figure 1 This invention provides a technical solution: a method for identifying server malfunctions, comprising the following steps: S1. During the transient process of the server executing a load step command, instantaneous voltage and current waveforms are acquired through a power acquisition unit connected to the power input circuit. The transient response time window before and after the load change is defined based on the ramp slope of the instantaneous current waveform, and the voltage drop amplitude within this time window is extracted as an electrical response feature; S2. The transient response time window is time-domain aligned with the triggering time of the load step command. The lag time from the command triggering time to the moment when the instantaneous current waveform first reaches the steady-state threshold is calculated, and the lag time is correlated with the voltage drop amplitude. When the lag time or the voltage drop amplitude exceeds a preset dynamic damping range, a determination is made. S3. If the dynamic adjustment of the power supply circuit is determined to be normal, the instantaneous current waveform is processed to remove the fundamental wave to retain the high-frequency conducted interference component. The high-frequency conducted interference component is divided into multiple independent frequency band subspaces by applying a time-frequency signal decomposition algorithm, and the signal energy spectral density and spectral entropy value in each frequency band subspace are calculated to construct a frequency domain feature vector characterizing the current electromagnetic noise characteristics of the server. S4. When the frequency domain feature vector shows abnormal fluctuations in spectral entropy value, the reference frequency domain feature vector of the adjacent node on the same power supply bus is read. By calculating the correlation between the frequency domain feature vector and the reference frequency domain feature vector in the corresponding frequency band, the common-mode interference component introduced by the external power grid is separated. The server internal hardware is judged to have physical operation abnormalities based only on the remaining differential-mode interference component.

[0021] In this implementation scheme, S1: The core of this step is to use load step as an active excitation source to capture the dynamic behavior of the power supply circuit at the physical level through the power acquisition unit. During the transient process of the server executing a high-intensity load step command (e.g., the CPU instantly going from idle to full load), the power supply current will change drastically. At this time, the system locks the "transient response time window" based on the rise slope of the instantaneous current waveform (i.e., the first derivative of current with respect to time, di / dt). This transient response time window refers to the transition period from the start of the load change to the recovery of voltage and current to a new steady state, which contains the most critical physical information reflecting the power supply performance. The voltage drop amplitude extracted within this window specifically refers to the maximum instantaneous voltage drop caused by the power supply output impedance and line inductance at the moment of load surge. The technical role of this step is that by locking this specific physical process, the raw electrical fingerprint reflecting the transient support capability of the power supply unit (PSU) can be directly obtained, providing a clean physical data foundation for subsequent analysis. S2: This step aims to evaluate the dynamic adjustment agility of the power supply circuit. The system first precisely aligns the time axis of the transient waveform acquired by the physical layer with the instruction triggering time of the logic layer in the time domain to eliminate transmission delay errors. The subsequently calculated lag time refers to the time difference between the instruction being issued (demand generated) and the current waveform actually rising to the steady-state threshold (energy arrival). This parameter directly reflects the response speed of the power supply loop. The dynamic damping range is a multi-dimensional safety domain based on control theory, used to measure the stability and convergence speed of a second-order circuit system under step response. Combining the lag time with the voltage drop amplitude analysis essentially detects whether the equivalent series resistance (ESR) and phase margin of the power supply loop have drifted. If the data exceeds the preset range, it indicates that the capacitors inside the power module are aging or the feedback loop gain has decreased, leading to untimely "blood supply" or overshoot oscillation. The technical role of this step is to identify "sub-healthy" hardware that, although the steady-state voltage is normal, has severely degraded dynamic regulation capabilities, preventing calculation errors caused by sluggish power supply response. S3: If the power supply dynamic performance is acceptable, this step further delves into the microscopic electrical noise characteristics. First, the instantaneous current waveform undergoes fundamental frequency removal processing, i.e., filtering out the 50Hz / 60Hz main power frequency energy using a filter, retaining only the high-frequency conducted interference components carrying fault information. Then, a time-frequency signal decomposition algorithm (such as wavelet packet decomposition) is applied to deconstruct these complex noise signals into different independent frequency band subspaces. The signal energy spectral density calculated based on this reflects the intensity distribution of noise at different frequencies, while the spectral entropy value, borrowed from information theory, is used to quantify the degree of disorder or complexity of the signal energy distribution. Typically, early physical damage to hardware (such as partial discharge caused by microcracks in the insulation layer) manifests as abrupt changes in the spectral entropy value within a specific frequency band.The technical role of this step is to transform elusive analog noise signals into quantifiable frequency domain feature vectors, thereby enabling the detection of hidden physical layer faults such as poor contact and precursors to component breakdown. S4: This step addresses the problem of false alarms caused by the complex electromagnetic environment in industrial settings. When an abnormal spectral entropy value is detected, the system utilizes the power supply topology to read the characteristics of adjacent nodes on the same power supply bus (PDU, a common busbar providing power to multiple servers). By calculating the correlation between the two in their corresponding frequency bands, the nature of the interference is distinguished: common-mode interference usually originates from the external power grid (such as large motor startup or lightning surges) and exhibits high correlation on all nodes of the same busbar; while differential-mode interference is unique noise generated within the circuit, showing significant characteristics only on the faulty device. This step uses mathematical correlation analysis to remove externally introduced "environmental background noise," determining hardware faults solely based on the remaining, independently characteristic differential-mode components. The technical role of this step is to greatly improve the anti-interference capability and accuracy of anomaly identification, ensuring that alarm signals strictly correspond to damage to internal physical components of the server, rather than fluctuations in the external power grid.

[0022] Specifically, the process of defining the transient response time window before and after a load change based on the ramp-up slope of the instantaneous current waveform, and extracting the voltage drop amplitude within this time window as an electrical response feature, is as follows: The instantaneous current waveform data acquired by the power acquisition unit is processed by first-order differentiation to generate a current rate of change sequence. This sequence is scanned, and the moment when the rate of change amplitude first exceeds a preset trigger threshold is identified as the start time of the transient response. Starting from this moment, the decay trend of the current rate of change sequence is continuously monitored. When the rate of change amplitude continuously converges within a preset steady-state noise tolerance band, this moment is marked as the end time of the transient response, thus defining a closed transient response time window. The arithmetic mean of the instantaneous voltage waveforms before the start time of the transient response is used as the reference voltage. The instantaneous voltage waveform data within the transient response time window is traversed, and the local minimum point of the voltage amplitude is searched and identified. The difference between the reference voltage and the local minimum point is calculated, and this difference is defined as the voltage drop amplitude.

[0023] In this implementation scheme, defining the transient response time window before and after a load change based on the ramp-up slope of the instantaneous current waveform, and extracting the voltage drop amplitude within this time window as an electrical response characteristic, is essentially a digital reconstruction of the physical process of the power circuit's step response. To accurately capture the instantaneous load change, the discrete instantaneous current waveform data acquired by the power acquisition unit is first subjected to a first-order differential operation to generate a current change rate sequence reflecting the rate of current change. Subsequently, the system scans this sequence and locks the moment when the current change rate amplitude first exceeds a preset trigger threshold as the start time of the transient response. This preset trigger threshold is determined by taking three times the standard deviation of the background current noise variance in the idle state of the statistical server as the judgment boundary to ensure that it is not falsely triggered by random noise. Starting from the start time, the decay trend of the current change rate sequence is continuously monitored. When the change rate amplitude continuously converges within a preset steady-state noise tolerance band, it indicates that the circuit has completed charge redistribution and established a new steady state. This moment is then marked as the end time of the transient response, thus defining a closed transient response time window. Based on this, the calculation formula for extracting the voltage drop amplitude is as follows: The meanings of the parameters in the formula are as follows: The calculated voltage drop amplitude characterizes the degree of voltage sag of the power supply output impedance during sudden load changes. The number of sampling points before the start of the transient response is used to calculate the reference voltage; :Instantaneous voltage sampling sequence before load change; The instantaneous voltage sampling sequence within the transient response time window; :Index to the start time of the transient response; : Index of the termination time of the transient response; The minimum value calculation function. Through this step, the simulated voltage fluctuations are transformed into quantifiable electrical response characteristics, which intuitively reflect the transient support capability of the power distribution network under high-frequency load impacts.

[0024] Specifically, the process of aligning the transient response time window with the trigger time of the load step instruction in the time domain and calculating the lag time from the instruction trigger time to the moment when the instantaneous current waveform first reaches the steady-state threshold is as follows: The hardware interrupt timestamp generated by the load step instruction in the operating system kernel is obtained as the instruction trigger time. This timestamp is mapped to the sampling clock axis of the power acquisition unit through a hardware synchronization signal, thus unifying the time domain reference. In the time-domain aligned instantaneous current waveform, a preset percentage of the target load current value is set as the rising edge decision threshold. A moving average filter is applied to the instantaneous current waveform to eliminate high-frequency glitches. The first intersection point between the filtered waveform and the rising edge decision threshold is searched along the time axis in the forward direction. The time difference between the instruction trigger time and this intersection point is calculated, and after deducting the fixed signal transmission delay and sensor response delay from this time difference, the lag time characterizing the power supply loop bandwidth is obtained.

[0025] In this implementation, the transient response time window is aligned with the trigger time of the load step instruction in the time domain. The process of calculating the lag time from the instruction trigger time to the moment when the instantaneous current waveform first reaches the steady-state threshold aims to eliminate measurement errors caused by heterogeneous system clocks and restore the true physical delay of the power supply loop. First, the hardware interrupt timestamp generated by the load step instruction in the operating system kernel is obtained as the instruction trigger time. This timestamp is then precisely mapped to the sampling clock axis of the power acquisition unit using a hardware synchronization signal (such as a GPIO trigger pulse), achieving a unified reference between software and physical time. In the time-domain aligned instantaneous current waveform, to eliminate interference from high-frequency switching noise on zero-crossing point judgment, a moving average filter is used to smooth the waveform. A preset percentage (usually 90%) of the target load current value is set as the rising edge decision threshold. A forward search along the time axis is performed to lock the first intersection point between the filtered waveform and the rising edge decision threshold. Subsequently, the lag time characterizing the power supply loop bandwidth is calculated using the following formula: The meanings of the parameters in the formula are as follows: Corrected pure lag time, which directly reflects the phase response speed of the power supply control loop; The moment when the current waveform intersects with the rising edge decision threshold; : The trigger time of the mapped instruction; The fixed transmission delay of a signal on a transmission cable is determined by the cable length and the signal propagation speed. The inherent response delay of the current sensor is determined by the sensor's factory calibration parameters. By subtracting the fixed hardware delay, this step can accurately isolate the hysteresis characteristics that are only related to the dynamic performance of the power supply regulation loop, providing unbiased data for subsequent evaluation of loop stability.

[0026] Specifically, the correlation analysis between the hysteresis duration and the voltage drop amplitude is performed. When the hysteresis duration or voltage drop amplitude exceeds the preset dynamic damping range, the specific process for determining that there is a dynamic adjustment anomaly in the power supply circuit is as follows: Construct a two-dimensional dynamic response evaluation plane with the hysteresis duration as the horizontal axis and the voltage drop amplitude as the vertical axis, and map the currently calculated hysteresis duration and voltage drop amplitude to a measured state coordinate point on the plane; retrieve the standard damping characteristic curve of the server at the factory as a safety boundary reference, calculate the normal deviation vector of the measured state coordinate point relative to the standard damping characteristic curve, and obtain the magnitude and direction angle of the deviation vector; when the magnitude of the deviation vector is greater than the preset fault tolerance radius, determine the anomaly type according to the quadrant in which the direction angle is located. If the direction angle points to the high hysteresis region, it is determined that the phase margin is insufficient due to the aging of the power supply loop compensation capacitor. If the direction angle points to the high drop region, it is determined that the impedance mismatch is caused by the increase in the equivalent series resistance of the power supply output path, confirming that there is a dynamic adjustment anomaly in the power supply circuit.

[0027] In this implementation scheme, the correlation analysis between the hysteresis duration and the voltage drop amplitude is performed. When either the hysteresis duration or the voltage drop amplitude exceeds the preset dynamic damping range, the process of determining that the power supply circuit has a dynamic adjustment anomaly is achieved through vectorized diagnosis of the power supply health status using multi-dimensional feature fusion technology. First, a two-dimensional dynamic response evaluation plane is constructed with the hysteresis duration as the horizontal axis and the voltage drop amplitude as the vertical axis. The currently calculated hysteresis duration and voltage drop amplitude are mapped to a measured state coordinate point on the plane. Then, the standard damping characteristic curve of the server at the time of manufacture is retrieved as a safety boundary reference. This curve describes the standard response trajectory of an ideal power supply under different load conditions. Next, the deviation of the measured state coordinate point from the standard damping characteristic curve is calculated using the following formula: ; The meanings of the parameters in the formula are as follows: The magnitude of the calculated deviation vector is used to quantify the overall degree to which the current state deviates from the ideal operating point; : Normalized weighting coefficient for lag duration, which is determined based on the inverse of the variance of lag duration of historical normal data; The normalized weighting coefficient of voltage drop amplitude is used to balance the influence weights of two different physical dimensions; The lag time obtained from the current actual measurement; The reference hysteresis time that corresponds most closely to the measured point on the standard damping characteristic curve; The measured voltage drop amplitude is currently available. The reference voltage drop amplitude corresponding to the standard damping characteristic curve; The calculated deviation vector's direction angle indicates the physical properties of the fault. When the magnitude of the calculated deviation vector exceeds the preset tolerance radius, the system characterizes the fault based on the quadrant where the direction angle lies: if the direction angle points to a high hysteresis region, it indicates insufficient phase margin in the power supply loop, indicating aging of the compensation capacitor; if the direction angle points to a high dropout region, it indicates output path impedance mismatch, indicating increased equivalent series resistance. This step, through vector analysis, achieves a technological leap from single-value alarms to fault root cause localization.

[0028] Specifically, if the dynamic adjustment of the power supply circuit is determined to be normal, the instantaneous current waveform is processed to remove the fundamental frequency to retain the high-frequency conducted interference component. The specific process of dividing the high-frequency conducted interference component into multiple independent frequency band subspaces using the time-frequency signal decomposition algorithm is as follows: By tracking the power frequency fundamental component in the instantaneous current waveform in real time, a standard sinusoidal reference signal with the same frequency and phase as the fundamental component is constructed. The instantaneous current waveform and the standard sinusoidal reference signal are differentially processed to filter out the large-amplitude power frequency energy and extract the small-amplitude residual signal as the original conducted interference component. A mother wavelet function with orthogonal compact support characteristics is selected to perform multi-level wavelet packet decomposition on the original conducted interference component, generating a decomposition tree structure containing low-frequency approximation coefficients and high-frequency detail coefficients. The coefficients of each node at the end of the decomposition tree are reconstructed by a single branch to decouple the broadband interference signal in the time domain into non-overlapping narrow-band time domain signals covering different center frequencies, and each narrow-band time domain signal is defined as an independent frequency band subspace.

[0029] In this implementation scheme, if the power supply circuit dynamic adjustment is determined to be normal, the instantaneous current waveform is processed to remove the fundamental frequency to retain the high-frequency conducted interference component. The process of dividing the high-frequency conducted interference component into multiple independent frequency band subspaces using a time-frequency signal decomposition algorithm aims to eliminate the masking effect of the main power frequency energy on weak fault characteristics and achieve refined separation of non-stationary signals. First, the zero-crossing point of the acquired current data is locked using software phase-locked loop technology to generate a standard sinusoidal reference signal. Then, a time-domain difference operation is performed between the instantaneous current waveform and this reference signal to remove the large-amplitude 50Hz or 60Hz fundamental frequency, thereby obtaining a pure residual signal. Next, a mother wavelet function with orthogonal compact support characteristics (such as the Daubechies wavelet) is selected to perform multi-level wavelet packet decomposition on the residual signal. Unlike ordinary wavelet transform, which only decomposes the low-frequency part, wavelet packet decomposition iteratively decomposes the high-frequency details simultaneously, thereby generating a complete decomposition tree. To obtain the time-domain waveforms of each frequency band, the node coefficients at the end of the decomposition tree are reconstructed using a single branch, calculated as follows: The meanings of the parameters in the formula are as follows: The reconstructed narrowband time-domain signal of the nth frequency band subspace obtained by calculation, which characterizes the details of the time-varying conducted interference within a specific frequency range; The depth of wavelet packet decomposition, a parameter that determines the fineness of frequency resolution; The index number of the frequency node corresponds to a different frequency band location; The time shift factor is used to locate the position of the signal on the time axis; The wavelet packet decomposition coefficients of the nth node in the j-th layer are obtained from the convolution operation and downsampling process; The wavelet packet basis function of the j-th layer and n-th node is derived from the mother wavelet function. Through the above reconstruction process, the originally superimposed broadband interference signals are decoupled into a series of non-overlapping narrowband signals. Each frequency band subspace actually corresponds to a virtual bandpass filter channel, enabling independent analysis of fault noise at specific frequencies (such as the switching frequency harmonics of a switching power supply).

[0030] Specifically, the process of calculating the signal energy spectral density and spectral entropy value in each frequency band subspace and constructing a frequency domain feature vector characterizing the current electromagnetic noise characteristics of the server is as follows: A windowed Fourier transform is performed on the narrowband time-domain signal in each frequency band subspace to obtain the power distribution of that frequency band within a unit bandwidth. The root mean square value of that frequency band is calculated by integration to obtain the total signal energy of the subspace. The total signal energy of the frequency band subspace is normalized, and the negative logarithmic expectation of the probability density function of the energy distribution is calculated to obtain the spectral entropy value characterizing the degree of signal disorder in that frequency band. The total signal energy and spectral entropy value of each frequency band subspace are concatenated in order of frequency from low to high to construct a multi-dimensional frequency domain feature vector, where each dimension of the vector corresponds to the electromagnetic noise intensity and complexity within a specific frequency range.

[0031] In this implementation scheme, the process of calculating the signal energy spectral density and spectral entropy value within each frequency band subspace, and constructing a frequency domain feature vector characterizing the current electromagnetic noise characteristics of the server, is a key step in converting the simulated time-varying signal into a computer-understandable digital fingerprint. First, a windowed Fourier transform is applied to the narrowband time-domain signal within each frequency band subspace to suppress spectral leakage and obtain the power distribution within a unit bandwidth of that frequency band. Then, the power spectral density is integrated to obtain the total energy of that subspace, and the spectral entropy value, characterizing the degree of signal disorder, is further calculated. The formula for calculating the spectral entropy value is as follows: The meanings of the parameters in the formula are as follows: The calculated spectral entropy value of this frequency band subspace indicates that the noise is characterized by deterministic harmonics (such as the sound of a fan running), while the higher the value, the more random the noise is characterized by broadband pulses (such as electrical spark discharges). The total number of discrete frequency points within this frequency band subspace; : Index variable for frequency points; : No. The signal energy spectral density value at each frequency point represents the amount of physical energy carried at that frequency point; The sum of the energy spectral densities of all frequency points within this frequency band subspace is used to normalize the probability distribution. Through the above calculations, the system generates a two-dimensional "energy entropy" feature for each frequency band. Finally, the features of all subspaces are concatenated in ascending frequency order to construct a high-dimensional frequency domain feature vector. This vector not only records the energy distribution of the interference signal but also the signal complexity, thus providing rich information dimensions for distinguishing different types of hardware faults.

[0032] Specifically, when the frequency domain feature vector shows abnormal fluctuations in spectral entropy, the reference frequency domain feature vectors of adjacent nodes on the same power supply bus are read. The specific process of calculating the correlation between the frequency domain feature vector and the reference frequency domain feature vector in the corresponding frequency band is as follows: a sensitivity threshold for spectral entropy changes is set, and the temporal variance of the high-frequency band spectral entropy value in the frequency domain feature vector is monitored in real time. When the variance changes abruptly and exceeds the sensitivity threshold, a collaborative verification mechanism is triggered. Through the infrastructure management interface of the data center, other servers physically connected to the same power distribution unit are identified as neighboring nodes, and the reference frequency domain feature vectors generated by the neighboring nodes at the same time are requested and synchronously obtained. The energy trend similarity between the frequency domain feature vector and the reference frequency domain feature vector in each corresponding frequency band subspace is calculated using the Pearson correlation coefficient algorithm, generating a spectral correlation sequence composed of multiple correlation coefficients.

[0033] In this implementation scheme, when the frequency domain feature vector shows abnormal fluctuations in spectral entropy, the reference frequency domain feature vectors of adjacent nodes on the same power supply busbar are read. The process of calculating the correlation between the frequency domain feature vector and the reference frequency domain feature vector in the corresponding frequency band is a key step in utilizing the topological redundancy characteristics of the data center for environmental denoising. First, the system sets a sensitivity threshold for spectral entropy changes, typically 3 to 5 times the standard deviation of spectral entropy during historical normal operation. The temporal variance of the spectral entropy value in the high-frequency band (usually above 100kHz, which is most sensitive to discharge noise) of the frequency domain feature vector is monitored in real time. Once a variance mutation exceeding the sensitivity threshold is detected, it indicates a drastic change in signal complexity, triggering a collaborative verification mechanism. Through an infrastructure management interface (such as a DCIM system), the system identifies other servers physically connected to the same power distribution unit (PDU) as neighboring nodes, as these nodes share the same mains power input and should theoretically withstand the same grid background noise. After requesting and synchronously obtaining the reference frequency domain feature vectors generated by neighboring nodes at the same time, the energy trend similarity between the two is calculated using the Pearson correlation coefficient algorithm. The calculation formula is as follows: The meanings of the parameters in the formula are as follows: : The calculated first The Pearson correlation coefficient for each frequency band subspace ranges from -1 to 1. The closer the value is to 1, the more synchronized the noise fluctuations of the two servers are in that frequency band. Index identifier for frequency band subspace; The length of the time sliding window used for correlation calculation determines the time span of the correlation analysis. The target server (the machine under test) was at the... The frequency band, the first Signal energy values ​​measured at each time point; The arithmetic mean of the energy in this frequency band for the target server within a time sliding window; The energy value of the reference signal measured by neighboring nodes (reference devices) in the same frequency band and at the same time point; The arithmetic mean of the energy of neighboring nodes in this frequency band within a time sliding window. This calculation step transforms isolated single-machine detection into multi-machine collaborative verification, quantifies the degree of linear correlation between the target signal and the environmental background, and provides a mathematical basis for subsequent differentiation of the nature of interference sources.

[0034] Specifically, the process of separating the common-mode interference component introduced by the external power grid and determining whether there is a physical malfunction in the server's internal hardware based solely on the remaining differential-mode interference component is as follows: Traverse the spectral correlation sequence, filter out frequency bands with correlation coefficients higher than the preset common-mode judgment value, mark them as common-mode interference frequency bands affected by background noise from the external power grid, and perform weighted suppression or zeroing on the abnormal energy within these frequency bands; retain frequency bands with correlation coefficients lower than the common-mode judgment value as differential-mode interference frequency bands, extract the residual energy characteristics and spectral entropy characteristics within the differential-mode interference frequency bands, input the extracted residual characteristics into a preset hardware fault fingerprint database for matching, if there are non-Gaussian distributed burst high-energy pulses in the high-frequency band, it is determined to be a breakdown or aging of the internal components of the switching power supply module; if there is periodic modulation noise in the low-frequency band, it is determined to be an abnormal electromagnetic feedback generated by the cooling fan motor, confirming that there is a physical malfunction in the server's internal hardware.

[0035] In this implementation, the process of separating the common-mode interference component introduced by the external power grid and determining whether there is a physical malfunction in the server's internal hardware based solely on the remaining differential-mode interference component is essentially an adaptive filtering operation based on spatial correlation. First, the spectral correlation sequence generated in the previous step is traversed, and frequency bands with correlation coefficients higher than a preset common-mode judgment value (e.g., 0.8) are selected. These frequency bands exhibit high synchronicity and are marked as common-mode interference frequency bands affected by external power grid background noise (such as harmonic pollution). To accurately assess the hardware status, the energy within these frequency bands needs to be weighted and suppressed, retaining only the low-correlation differential-mode interference frequency bands. The formulas for extracting and evaluating the intensity of differential-mode interference components are as follows: The meanings of the parameters in the formula are as follows: : The calculated first The residual intensity of differential-mode interference after common-mode suppression in each frequency band represents the noise energy generated solely by internal server factors. The original measured number of Signal energy spectral density values ​​for each frequency band; The steepness coefficient of the suppression function is used to adjust the attenuation rate of the common-mode signal. The preset common-mode detection threshold will cause the energy in the frequency band with a correlation coefficient exceeding this value to be rapidly attenuated. The correlation coefficient for this frequency band is calculated in the previous steps. After calculation, the features of the high-energy residual frequency band are extracted and input into the hardware fault fingerprint database. If there are non-Gaussian distributed bursts of high-energy pulses in the high-frequency band (i.e., high residual intensity and high spectral entropy), the feature matching algorithm determines that it is a partial discharge caused by breakdown of MOSFETs or capacitors or insulation aging inside the switching power supply module (VRM); if there is periodic modulation noise in the low-frequency band (i.e., high residual intensity but relatively stable spectral entropy), it is determined to be an abnormal electromagnetic feedback caused by wear of the cooling fan motor bearing or short circuit between coil turns. This step uses dual filtering of mathematical and physical features to ensure that the alarm results strictly point to the failure of physical components inside the server, completely eliminating false alarms caused by power grid fluctuations.

[0036] In summary, this application has at least the following effects:

[0037] A method for identifying server operational anomalies captures the instantaneous electrical waveform during load step changes, defines the transient window using the current ramp-up rate, and analyzes the hysteresis duration and voltage drop characteristics. This enables quantitative diagnosis of the dynamic adjustment capability of the power supply circuit and effectively identifies potential aging issues in components that cannot be detected by static monitoring. Furthermore, it combines time-frequency signal decomposition technology to extract high-frequency conducted interference fingerprints and adaptively eliminates common-mode background noise introduced by the external power grid by comparing the spectral correlation of adjacent nodes on the same busbar. Fault determination is based solely on the differential-mode component reflecting internal physical damage, thus achieving accurate anomaly identification with both high sensitivity and low false alarm rate in complex electromagnetic environments.

[0038] Those skilled in the art will understand that embodiments of the present invention can be provided as methods, systems, or computer program products. Therefore, the present invention can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention can take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0039] This invention is described with reference to flowchart illustrations and / or block diagrams of systems, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0040] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0041] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0042] Although preferred embodiments of the invention have been described, those skilled in the art, upon learning the basic inventive concept, can make other changes and modifications to these embodiments. Therefore, the appended claims are intended to be interpreted as including both the preferred embodiments and all changes and modifications falling within the scope of the invention.

[0043] Obviously, those skilled in the art can make various modifications and variations to this invention without departing from its spirit and scope. Therefore, if these modifications and variations fall within the scope of the claims of this invention and their equivalents, this invention also intends to include these modifications and variations.

Claims

1. A method for identifying server malfunctions, characterized in that, Includes the following steps: S1. During the transient process of the server executing the load step command, the instantaneous voltage waveform and instantaneous current waveform are acquired by the power acquisition unit connected to the power input circuit. The transient response time window before and after the load change is defined according to the rise slope of the instantaneous current waveform, and the voltage drop amplitude within the time window is extracted as the electrical response feature. S2. Align the transient response time window with the triggering time of the load step command in the time domain, calculate the lag time from the triggering time of the command to the moment when the instantaneous current waveform first reaches the steady-state threshold, and perform correlation analysis between the lag time and the voltage drop amplitude. When the lag time or the voltage drop amplitude exceeds the preset dynamic damping range, it is determined that there is a dynamic adjustment abnormality in the power supply circuit. S3. If the dynamic adjustment of the power supply circuit is determined to be normal, the instantaneous current waveform is processed to remove the fundamental wave in order to retain the high-frequency conducted interference component. The high-frequency conducted interference component is divided into multiple independent frequency band subspaces by applying the time-frequency signal decomposition algorithm, and the signal energy spectral density and spectral entropy value in each frequency band subspace are calculated to construct a frequency domain feature vector characterizing the current electromagnetic noise characteristics of the server. S4. When the frequency domain feature vector shows abnormal fluctuations in spectral entropy, read the reference frequency domain feature vector of adjacent nodes on the same power supply busbar. By calculating the correlation between the frequency domain feature vector and the reference frequency domain feature vector in the corresponding frequency band, separate the common-mode interference component introduced by the external power grid. Only based on the remaining differential-mode interference component, determine whether there is a physical malfunction in the internal hardware of the server.

2. The method for identifying server malfunctions according to claim 1, characterized in that: The specific process of defining the transient response time window before and after a load change based on the rise slope of the instantaneous current waveform, and extracting the voltage drop amplitude within this time window as an electrical response characteristic, is as follows: The instantaneous current waveform data acquired by the power acquisition unit is processed by first-order differentiation to generate a current change rate sequence. The sequence is scanned and the moment when the change rate amplitude first exceeds the preset trigger threshold is locked as the start moment of the transient response. Starting from the initial moment, the decay trend of the current change rate sequence is continuously monitored. When the change rate amplitude continuously converges within the preset steady-state noise tolerance band, this moment is marked as the termination moment of the transient response, thereby defining the closed transient response time window. The arithmetic mean of the instantaneous voltage waveforms before the start of the transient response is used as the reference voltage. The instantaneous voltage waveform data within the transient response time window are traversed to search for and lock the local minimum point of the voltage amplitude. The difference between the reference voltage and the local minimum point is calculated and defined as the voltage drop amplitude.

3. The method for identifying server malfunctions according to claim 1, characterized in that: The specific process of aligning the transient response time window with the triggering time of the load step command in the time domain, and calculating the lag time from the command triggering time to the moment when the instantaneous current waveform first reaches the steady-state threshold is as follows: The hardware interrupt timestamp generated in the operating system kernel by the load step instruction is obtained as the instruction trigger time. This timestamp is then mapped to the sampling clock axis of the power acquisition unit through a hardware synchronization signal to achieve the unification of the time domain reference. In the instantaneous current waveform after time-domain alignment, a preset percentage of the target load current value is set as the rising edge decision threshold. The instantaneous current waveform is then subjected to moving average filtering to eliminate high-frequency glitches. The first intersection point between the filtered waveform and the rising edge decision threshold is searched along the time axis in the forward direction. The time difference between the command triggering moment and the intersection point is calculated, and the fixed signal transmission delay and sensor response delay are deducted from the time difference to obtain the hysteresis time characterizing the power supply loop bandwidth.

4. The server operation anomaly identification method according to claim 3, characterized in that: The correlation analysis between the hysteresis duration and the voltage drop amplitude, and the determination of a dynamic adjustment anomaly in the power supply circuit when either the hysteresis duration or the voltage drop amplitude exceeds the preset dynamic damping range, are as follows: A two-dimensional dynamic response evaluation plane is constructed with hysteresis duration as the horizontal axis and voltage drop amplitude as the vertical axis. The currently calculated hysteresis duration and voltage drop amplitude are mapped to a measured state coordinate point on the plane. The standard damping characteristic curve of the server at the time of manufacture is retrieved as a safety boundary reference. The normal deviation vector of the measured state coordinate point relative to the standard damping characteristic curve is calculated, and the magnitude and direction angle of the deviation vector are obtained. When the magnitude of the deviation vector exceeds the preset fault tolerance radius, the abnormality type is determined according to the quadrant in which the direction angle is located. If the direction angle points to the high hysteresis region, it is determined that the phase margin is insufficient due to the aging of the power loop compensation capacitor. If the direction angle points to the high drop region, it is determined that the impedance mismatch is caused by the increase of the equivalent series resistance of the power output path, confirming that there is a dynamic adjustment abnormality in the power circuit.

5. The method for identifying server malfunctions according to claim 1, characterized in that: If the dynamic adjustment of the power supply circuit is determined to be normal, the instantaneous current waveform is processed to remove the fundamental frequency to retain the high-frequency conducted interference component. The specific process of using a time-frequency signal decomposition algorithm to divide the high-frequency conducted interference component into multiple independent frequency band subspaces is as follows: By tracking the fundamental frequency component of the instantaneous current waveform in real time, a standard sinusoidal reference signal with the same frequency and phase as the fundamental frequency component is constructed. The instantaneous current waveform and the standard sinusoidal reference signal are differentially processed to filter out the large-amplitude power frequency energy and extract the small-amplitude residual signal as the original conducted interference component. A mother wavelet function with orthogonal compact support is selected to perform multi-level wavelet packet decomposition on the original conducted interference components, generating a decomposition tree structure containing low-frequency approximation coefficients and high-frequency detail coefficients. The coefficients of each node at the end of the decomposition tree are reconstructed by a single branch, and the broadband interference signal in the time domain is decoupled into narrowband time domain signals that do not overlap and cover different center frequencies. Each narrowband time domain signal is defined as an independent frequency band subspace.

6. The server operation anomaly identification method according to claim 5, characterized in that: The specific process of calculating the signal energy spectral density and spectral entropy value in each frequency band subspace and constructing a frequency domain feature vector characterizing the current electromagnetic noise characteristics of the server is as follows: Windowed Fourier transform is performed on the narrowband time-domain signal in each frequency band subspace to obtain the power distribution of the frequency band within a unit bandwidth. The root mean square value of the frequency band is calculated by integration to obtain the total energy of the subspace signal. The total signal energy in the frequency band subspace is normalized, and the negative logarithmic expectation of the probability density function of the energy distribution is calculated to obtain the spectral entropy value that characterizes the degree of signal disorder in the frequency band. The total energy and spectral entropy of the subspace signals in each frequency band are concatenated and arranged in order of frequency from low to high to construct a multi-dimensional frequency domain feature vector, where each dimension of the vector corresponds to the electromagnetic noise intensity and complexity within a specific frequency range.

7. The method for identifying server malfunctions according to claim 1, characterized in that: When the frequency domain feature vector shows abnormal fluctuations in spectral entropy, the reference frequency domain feature vectors of adjacent nodes on the same power supply busbar are read. The specific process of calculating the correlation between the frequency domain feature vector and the reference frequency domain feature vector in the corresponding frequency band is as follows: Set a sensitivity threshold for spectral entropy changes, monitor the temporal variance of high-frequency spectral entropy values ​​in the frequency domain feature vector in real time, and trigger a collaborative verification mechanism when the variance changes abruptly beyond the sensitivity threshold. Through the data center's infrastructure management interface, other servers physically connected to the same power distribution unit are identified as neighboring nodes, and reference frequency domain feature vectors generated by the neighboring nodes at the same time are requested and synchronously obtained. The energy trend similarity between the frequency domain feature vector and the reference frequency domain feature vector in each corresponding frequency band subspace is calculated using the Pearson correlation coefficient algorithm, generating a spectral correlation sequence composed of multiple correlation coefficients.

8. The method for identifying server malfunctions according to claim 7, characterized in that: The specific process of separating the common-mode interference component introduced by the external power grid and determining whether there is a physical malfunction in the server's internal hardware based solely on the remaining differential-mode interference component is as follows: Traverse the spectrum correlation sequence, filter out frequency bands with correlation coefficients higher than the preset common mode judgment value, mark them as common mode interference frequency bands affected by external power grid background noise, and perform weighted suppression or zeroing on the abnormal energy in the frequency band. The frequency band with a correlation coefficient lower than the common-mode judgment value is retained as the differential-mode interference frequency band. The residual energy features and spectral entropy features in the differential-mode interference frequency band are extracted. The extracted residual features are input into a preset hardware fault fingerprint database for matching. If there are non-Gaussian distributed burst high-energy pulses in the high-frequency band, it is determined to be the breakdown or aging of the internal components of the switching power supply module. If there is periodic modulation noise in the low-frequency band, it is determined to be the electromagnetic feedback abnormality generated by the cooling fan motor, confirming that there is a physical operation abnormality in the internal hardware of the server.