Method and apparatus for performance evaluation of a server

CN122240442APending Publication Date: 2026-06-19INSPUR SUZHOU INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
INSPUR SUZHOU INTELLIGENT TECH CO LTD
Filing Date
2026-05-22
Publication Date
2026-06-19

Smart Images

  • Figure CN122240442A_ABST
    Figure CN122240442A_ABST
Patent Text Reader

Abstract

This application discloses a server performance evaluation method and apparatus, relating to the field of server performance evaluation technology. The method includes: integrating an improved particle swarm optimization algorithm with ANFIS to dynamically adjust inertia weights through fuzzy inference, and executing the improved particle swarm optimization algorithm to optimize the fault diagnosis model parameters of BMC nodes. It also involves real-time collection of particle swarm diversity indicators and BMC node operating data for server performance evaluation and fault diagnosis. This solves the technical problem in related technologies where it is difficult to effectively balance startup efficiency, energy consumption control, and fault diagnosis accuracy when dealing with high-density, heterogeneous server clusters, thus failing to meet the needs of large-scale data centers for rapid fault location and self-healing. The method achieves the technical effect of improving diagnostic accuracy while reducing the total power consumption of the BMC cluster, reducing resource contention conflicts, and significantly improving the overall performance and reliability of the large-scale data center fault diagnosis system.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of server performance evaluation technology, and in particular to server performance evaluation methods and apparatus. Background Technology

[0002] Currently, with the rapid expansion of data center server scale, the BMC (Baseboard Management Controller), as a core component of server management, faces severe challenges in performance evaluation and fault diagnosis capabilities.

[0003] In terms of performance evaluation and fault diagnosis, most related technologies adopt preset threshold analysis strategies or simple rule judgment strategies, which are difficult to adapt to complex and ever-changing hardware fault scenarios. Moreover, the diagnostic models usually run independently on each BMC node, lacking cluster-level collaborative optimization capabilities, resulting in low resource utilization. In addition, related technologies are difficult to effectively combine BMC operating parameters with server hardware status for joint optimization, which greatly affects the accuracy and real-time performance of diagnosis.

[0004] In summary, the relevant technologies struggle to balance startup efficiency, energy consumption control, and fault diagnosis accuracy when dealing with high-density, heterogeneous server clusters, failing to meet the needs of large-scale data centers for rapid fault location and self-healing, and thus urgently require solutions. Summary of the Invention

[0005] This application provides a server performance evaluation method and apparatus to at least solve the technical problem in the related art that it is difficult to effectively balance startup efficiency, energy consumption control and fault diagnosis accuracy when dealing with high-density, heterogeneous server clusters, and that it cannot meet the needs of large-scale data centers for rapid fault location and self-healing.

[0006] This application provides a server performance evaluation method, comprising the following steps: obtaining configuration parameters of a target server cluster, generating corresponding particle swarm inertia weights based on a pre-trained neural fuzzy inference model, iteratively executing a preset particle swarm optimization calculation operation based on the particle swarm inertia weights and the configuration parameters to obtain target fault diagnosis model parameters that meet preset effective solution requirements; running a baseboard management controller in the target server cluster with the fault diagnosis model deployed using the target fault diagnosis model parameters to collect the operating data of the baseboard management controller during a target time period, and performing performance evaluation on at least some target servers in the target server cluster based on the operating data to obtain performance evaluation index change values ​​for the at least some target servers; responding to the performance evaluation index change value being greater than a preset change threshold, iteratively executing the preset particle swarm optimization calculation operation and performing performance evaluation operation based on the particle swarm inertia weights and the configuration parameters until the performance evaluation index change value obtained during the iteration process is less than or equal to the preset change threshold, generating final performance evaluation index change values ​​corresponding to the at least some target servers, and obtaining performance evaluation results corresponding to the at least some target servers based on the final performance evaluation index change values.

[0007] This application also provides a server performance evaluation device, comprising: a model parameter acquisition module, used to acquire configuration parameters of a target server cluster, and generate corresponding particle swarm inertia weights based on a pre-trained neural fuzzy inference model, and iteratively execute a preset particle swarm optimization calculation operation based on the particle swarm inertia weights and the configuration parameters to obtain target fault diagnosis model parameters that meet preset effective solution requirements; and a performance evaluation module, used to run a baseboard management controller in the target server cluster that deploys the fault diagnosis model using the target fault diagnosis model parameters, to collect the operating data of the baseboard management controller during a target time period, and evaluate the performance of the target server cluster based on the operating data. The performance evaluation of at least some target servers in the server cluster is performed to obtain the performance evaluation index change values ​​of the at least some target servers. The iterative optimization module is used to respond to the performance evaluation index change value being greater than a preset change threshold, and based on the particle swarm inertia weight and the configuration parameters, iteratively execute the preset particle swarm optimization calculation operation and perform the performance evaluation operation until the performance evaluation index change value obtained during the iteration process is less than or equal to the preset change threshold, and generate the final performance evaluation index change value corresponding to the at least some target servers, so as to obtain the performance evaluation result corresponding to the at least some target servers based on the final performance evaluation index change value.

[0008] This application also provides an electronic device, including: a memory for storing a computer program; and a processor for executing the computer program to implement the steps of any of the above-described server performance evaluation methods.

[0009] This application also provides a non-volatile computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of any of the above-described server performance evaluation methods.

[0010] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the steps of any of the above-described server performance evaluation methods.

[0011] This application allows obtaining configuration parameters of a target server cluster and generating corresponding particle swarm inertia weights based on a pre-trained neurofuzzy inference model. Based on the particle swarm inertia weights and configuration parameters, iteratively executing preset particle swarm optimization calculations yields target fault diagnosis model parameters that meet preset effective solution requirements. The application then uses these parameters to run a baseboard management controller within the target server cluster that houses the fault diagnosis model, collecting operational data from the baseboard management controller over a target time period. Based on this operational data, performance evaluations are performed on at least a portion of the target servers in the target server cluster to obtain performance evaluation metric changes for at least a portion of the target servers. In response to a performance evaluation metric change exceeding a preset threshold, iteratively executing preset... The particle swarm optimization (PSO) process performs computational and performance evaluation operations until the change in performance evaluation metrics obtained during the iteration process is less than or equal to a preset threshold. This generates final performance evaluation metric changes for at least a portion of the target servers, allowing for the generation of performance evaluation results for at least a portion of the target servers based on these final performance evaluation metric changes. Therefore, this approach addresses the technical challenges in effectively balancing startup efficiency, energy consumption control, and fault diagnosis accuracy when dealing with high-density, heterogeneous server clusters. It also fails to meet the demands of large-scale data centers for rapid fault location and self-healing. This approach achieves the technical effect of improving diagnostic accuracy while reducing the total power consumption of the BMC cluster, minimizing resource contention, and significantly enhancing the overall performance and reliability of large-scale data center fault diagnosis systems. Attached Figure Description

[0012] To more clearly illustrate the embodiments of this application, the accompanying drawings used in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0013] Figure 1 This is a flowchart of a server performance evaluation method provided according to an embodiment of this application; Figure 2 A schematic diagram illustrating the execution logic of a server performance evaluation method provided in one embodiment of this application; Figure 3 A schematic diagram illustrating the execution flow of an improved particle swarm optimization algorithm provided as an embodiment of this application; Figure 4 A schematic diagram of the logical architecture of a server performance evaluation system provided in one embodiment of this application; Figure 5 A schematic diagram of the logical architecture of a central optimization controller provided for one embodiment of this application; Figure 6 This is an example diagram of a server performance evaluation apparatus according to an embodiment of this application.

[0014] Among them, 10 is the server performance evaluation device, 100 is the model parameter acquisition module, 200 is the performance evaluation module, and 300 is the iterative optimization module. Detailed Implementation

[0015] The technical solutions of the embodiments of this application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative effort are within the protection scope of this application.

[0016] It should be noted that, in the description of this application, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. The terms "first," "second," etc., in this application are used to distinguish similar objects and are not used to describe a specific order or sequence.

[0017] To enable those skilled in the art to better understand the present application, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments.

[0018] The specific application environment architecture or specific hardware architecture on which the execution of the server performance evaluation method depends is described here.

[0019] Embodiments of this application provide a method for evaluating server performance.

[0020] like Figure 1 The diagram shows a flowchart of a server performance evaluation method according to an embodiment of this application. The server performance evaluation method includes the following steps: In step S101, the configuration parameters of the target server cluster are obtained, and the corresponding particle swarm inertia weights are generated according to the pre-trained neural fuzzy inference model. Based on the particle swarm inertia weights and configuration parameters, the preset particle swarm optimization calculation operation is iteratively executed to obtain the target fault diagnosis model parameters that meet the preset effective solution requirements.

[0021] Those skilled in the art should understand that traditional server BMC fault diagnosis and server performance evaluation methods generally use static threshold detection technology or build simple and independent diagnostic models, lacking dynamic parameter optimization and cluster collaboration mechanisms, resulting in poor diagnostic accuracy and real-time performance. At the same time, they fail to combine server operating status for intelligent analysis, which can easily lead to competition for diagnostic resources when a batch of servers fail at the same time, affecting overall operation and maintenance efficiency.

[0022] Therefore, the embodiments of this application model the distributed diagnostic process as a multi-objective resource-constrained optimization problem, and utilize the cooperative search and dynamic parameter tuning capabilities of the improved particle swarm optimization algorithm to find the parameter configuration with the best overall global performance while satisfying the resource competition constraints between nodes.

[0023] In practical implementation, embodiments of this application first train a lightweight ANFIS (Adaptive NeuroFuzzy Inference System) network (i.e., a neurofuzzy inference model) and dynamically adjust the particle swarm inertia weights of the particle swarm algorithm. w Furthermore, the particle swarm inertia weights are integrated into the particle swarm algorithm (i.e., an improved particle swarm algorithm is obtained); secondly, the embodiments of this application can obtain the configuration parameters issued by the central optimization controller to the server cluster, and combine the particle swarm inertia weights and configuration parameters to iteratively execute the preset particle swarm optimization calculation operation (i.e., the improved particle swarm algorithm) to obtain the target fault diagnosis model parameters that meet the preset effective solution requirements (i.e., obtain the global optimal solution of the particle swarm algorithm).

[0024] Therefore, the embodiments of this application integrate the improved particle swarm optimization algorithm of ANFIS and dynamically adjust the algorithm parameters according to the real-time optimization status, thereby achieving diagnostic parameter optimization under the premise of meeting resource constraints and improving the fault diagnosis efficiency of large-scale clusters.

[0025] Optionally, in one embodiment of this application, generating corresponding particle swarm inertia weights based on a pre-trained neurofuzzy inference model includes: acquiring historical server operation data of the target server cluster and particle swarm optimization logs corresponding to a preset particle swarm algorithm; training the neurofuzzy inference model based on the historical server operation data and particle swarm optimization logs, and determining the input feature matching rules of the neurofuzzy inference model; collecting particle swarm diversity features corresponding to the particle swarm algorithm and node operation parameters of the baseboard management controller based on the input feature matching rules, wherein the particle swarm diversity features include particle dispersion and particle aggregation, and the node operation parameters include hardware status data and resource occupancy data; inputting the particle swarm diversity features and node operation parameters into the trained neurofuzzy inference model to perform preset fuzzification transformation and rule-based inference operations on the particle swarm diversity features and node operation parameters to obtain corresponding inference operation results; and performing preset defuzzification processing on the inference operation results to generate particle swarm inertia weights that meet preset adaptation requirements.

[0026] In the specific implementation process, the embodiments of this application can achieve dynamic and precise adjustment of inertia weights through deep coupling of the neurofuzzy inference model and the particle swarm optimization algorithm, as described below: 1. System training and rule construction: This application embodiment can collect historical operating data (covering hardware status and resource usage) and particle swarm optimization process logs of the target server cluster, and use these as training datasets to construct a lightweight neural fuzzy inference model; at the same time, this application embodiment can formulate input feature matching rules based on data feature correlation, and explicitly use particle swarm diversity features (dispersion and aggregation) and baseboard management controller node operating parameters as the core inputs of the system.

[0027] 2. Feature Data Acquisition and Input: During the iteration process of the particle swarm optimization algorithm, the embodiments of this application can collect the following two types of core data in real time: (1) Particle swarm diversity characteristics: by calculating the dispersion and aggregation of particle position distribution, the current exploration and development status of the algorithm is quantified. (2) The operating parameters of the baseboard management controller node, including hardware health status and resource occupancy level, reflect the operating load of the actual application scenario.

[0028] Both types of data strictly follow the preset input feature matching rules to ensure the validity and relevance of the input data.

[0029] 3. Fuzzy reasoning and weight generation: In this embodiment, the collected feature data can be input into the trained neural fuzzy inference model, and fuzzification transformation, rule-based inference operation, and defuzzification processing can be performed sequentially, as described below: (1) The precise input quantities such as dispersion, clustering, and hardware status are mapped to fuzzy sets through fuzzification transformation; (2) The rule reasoning operation is based on the fuzzy rule base formed by training, and combines the current input features to reason out the fuzzy value of the output inertia weight; (3) Defuzzification process converts the fuzzy output into accurate inertial weight values. The generated weight values ​​are directly integrated into the main loop of the particle swarm algorithm as the core parameter for speed update.

[0030] 4. Dynamic rule update mechanism: During the iteration of the particle swarm optimization algorithm, the embodiments of this application can continuously feed new optimization logs and node running data back to the neural fuzzy inference model to achieve incremental updates of the fuzzy rule base. This ensures that the weight adjustment strategy is optimized in real time as the algorithm iterates and the server running status changes, avoiding adaptive decay caused by fixed rules. Therefore, the embodiments of this application dynamically adjust the particle swarm inertia weights through a neurofuzzy inference model to accurately match the algorithm iteration state with the server operating conditions, thereby improving the adaptive capability and convergence efficiency of the particle swarm algorithm, ensuring the accuracy and stability of the fault diagnosis model parameter optimization, and effectively improving the server fault diagnosis effect.

[0031] Optionally, in one embodiment of this application, based on particle swarm inertia weights and configuration parameters, a preset particle swarm optimization calculation operation is iteratively executed to obtain target fault diagnosis model parameters that meet preset effective solution requirements. This includes: traversing a preset particle swarm based on particle swarm inertia weights and configuration parameters to obtain the current position vector and current velocity vector of particles in the particle swarm; decoding the current position vector to obtain corresponding position decoding data, and sending the position decoding data to the fault diagnosis model of the baseboard management controller in the target server cluster; and running the baseboard management controller according to the position decoding data to collect corresponding controller operation data, wherein the controller operation data includes operating power consumption, diagnostic accuracy, and diagnostic utilization; and calculating the current fitness value of the current particle in the particle swarm based on the controller operation data. The algorithm acquires the global optimization position and global fitness value of the particle swarm that meet the preset optimization requirements. In response to a current fitness value greater than the global fitness value, it updates the global optimization position and global fitness value of the particle swarm, and updates the current position vector and current velocity vector of the current particle. It then determines whether the preset particle swarm optimization calculation operation meets the preset iteration termination number requirement and convergence requirement. If the preset iteration termination number requirement and convergence requirement are met, the target fault diagnosis model parameters are determined based on the updated global optimization position and global fitness value. Otherwise, the preset particle swarm optimization calculation operation is re-performed using the updated current position vector and current velocity vector until the preset iteration termination number requirement and convergence requirement are met, in order to obtain the corresponding target fault diagnosis model parameters.

[0032] It should be noted that the embodiments of this application first require system initialization to load server cluster configuration, algorithm parameters and model repository, and load ANFIS model; secondly, the embodiments of this application can generate an initial particle swarm and randomly initialize the position vector (i.e., current position vector) and velocity vector (i.e., current velocity vector) of each particle.

[0033] In the embodiments of this application, the particle's position vector represents the parameter configuration of all servers, as shown in the following formula: x i = [ x i1 , x i2 , x i3 ,..., x iD ] T , in, x ij This represents an adjustable parameter (such as a fault threshold, CPU quota, etc.), i.e., the first... iThe particle in the first j The values ​​that can be taken on the dimension parameter; D The total dimension representing the parameters, i.e., the total number of all parameters to be optimized, can be calculated using the following formula: D = N servers × ( K model + M BMC ), In the formula, N servers Number of servers; K model This indicates the number of model parameters (such as threshold, sampling rate, etc.) per server. M BMC This indicates the number of BMC operating parameters for each server (such as CPU quota, polling interval, etc.). T This represents the transpose of a vector.

[0034] The particle's velocity vector is shown in the following equation: v i = [ v i1 , v i2 , v i3 ,..., v iD ) T , in, v ij Indicates the first i The particle in the first j Update speed on dimensional parameters; D This represents the total number of parameters to be optimized, and the dimension of the position vector. D Exactly the same.

[0035] It should be noted that the mathematical expression for the initialization rule of the velocity vector is as follows: v ij (0) ~ U (-0.2 δ j , 0.2 δ j ),in δ j = b j - a j In the formula, v ij(0) indicates the first i The first particle j The initial velocity of the dimension, i.e. the initial step size and direction of parameter adjustment (positive indicates increase, negative indicates decrease); U ( min , max ) indicates uniformly distributed random sampling, ensuring the randomness of the initial exploration and preventing the algorithm from biasing towards specific regions; δ j Indicates the first j The width of the value range of a dimensional parameter reflects the size of its adjustable space; , Indicates the first j The lower and upper limits of the dimension parameter; the coefficient 0.2 indicates that 20% of the parameter range is taken as the initial velocity.

[0036] Subsequently, the embodiments of this application can execute the improved particle swarm optimization algorithm process (i.e., the preset particle swarm optimization calculation operation), perform multi-generation iterative optimization calculation, and determine whether the corresponding effective solution (i.e., the global optimal solution gbest of the improved particle swarm optimization algorithm when the preset iteration termination number and convergence requirements are met) is successfully obtained after multi-generation iterative optimization calculation. When an effective solution is successfully obtained, the corresponding target fault diagnosis model parameters are determined based on the effective solution.

[0037] In actual implementation, the steps for performing the preset particle swarm optimization calculation operation in this embodiment are as follows: Step 1: Calculate particle swarm diversity and use the particle swarm inertia weights dynamically calculated by ANFIS. w Iterate through each particle in the particle swarm to obtain the current position vector and current velocity vector of each particle; Step 2: Decode the particle position vector (such as the current position vector) into specific BMC parameter configurations (i.e., position decoding data); Step 3: Deploy the decoded parameters to the fault diagnosis model of the BMC node in the server cluster, run the BMC to perform diagnostic tests, and collect the corresponding controller operation data, which includes indicators such as power consumption, diagnostic accuracy and diagnostic utilization. Step 4: Based on the controller's running data, calculate the current fitness value of the current particle in the particle swarm, and obtain the global optimal position (i.e., the global best position gbest) and the global fitness optimization value (i.e., the global best fitness value) in the particle swarm that meet the preset optimization requirements. Step 5: Determine whether the current fitness value of the current particle is better than (i.e., greater than) the current global best fitness value. If it is better than the current global best fitness value, update the current best position gbest and the global best fitness value; otherwise, keep the global best position gbest unchanged. Step 6: Update the current position vector and current velocity vector of the current particle; Step 7: Determine whether the preset particle swarm optimization calculation operation has reached the iteration termination number and whether the algorithm convergence requirement is met. If the iteration termination number has been reached and the algorithm convergence requirement is met, determine the target fault diagnosis model parameters based on the updated optimal position gbest and the global optimal fitness value. Otherwise, use the updated position vector and velocity vector to re-perform the particle swarm optimization calculation operation until the preset iteration number and convergence requirement are met, so as to obtain the target fault diagnosis model parameters.

[0038] Therefore, the embodiments of this application can accurately output fault diagnosis parameters adapted to BMC nodes by performing system initialization and improved particle swarm optimization, and combining ANFIS to dynamically adjust inertia weights. In addition, running data is collected in real time during the iteration process to optimize global fitness, thereby ensuring the effectiveness of parameters and improving the accuracy and reliability of server performance evaluation.

[0039] Optionally, in one embodiment of this application, updating the global optimization position and global fitness optimization value of the particle swarm in response to the current fitness value being greater than the global fitness optimization value includes: obtaining the historical fitness optimization value of the current particle and the global fitness optimization value of the particle swarm that meet preset optimization requirements; determining the current optimization position corresponding to the current fitness value in response to the current fitness value being greater than the historical fitness optimization value; updating the individual historical optimization position and historical fitness optimization value of the current particle based on the current fitness value and the current optimization position; and determining whether the updated historical fitness optimization value is greater than the global fitness optimization value; and updating the global optimization position and global fitness optimization value of the particle swarm based on the updated individual historical optimization position and historical fitness optimization value in response to the updated historical fitness optimization value being greater than the global fitness optimization value.

[0040] As one possible approach, after obtaining the current fitness value of the current particle, this embodiment of the application can obtain the individual historical best position pbest and the optimal fitness value (i.e., the historical fitness optimization value of the current particle that meets the preset optimization requirements) of the current particle, as well as the global optimal fitness value of the particle swarm (i.e., the global fitness optimization value of the particle swarm that meets the preset optimization requirements), and determine whether the current fitness value is greater than the optimal fitness value. If the current fitness value is greater than the current particle's best fitness value, then update the current particle's historical best position pbest and best fitness value; otherwise, keep the current particle's historical best position pbest unchanged.

[0041] Subsequently, embodiments of this application can determine whether the updated optimal fitness value of the current particle is greater than the global fitness optimization value of the particle swarm; if the updated optimal fitness value of the current particle is greater than the global fitness optimization value, then based on the updated individual historical optimization position and historical fitness optimization value, the global optimization position (i.e., the global optimal position gbest) and the global fitness optimization value (i.e., the global optimal fitness value) of the particle swarm are updated; otherwise, the global optimal position gbest remains unchanged.

[0042] Therefore, the embodiments of this application adopt a two-layer optimization mechanism that first updates the optimal individual particle and then updates the global optimal swarm, so as to accurately screen the high-quality parameter configurations in the particle swarm iteration process, thereby ensuring that the optimal characteristics of each generation of particles are fully explored and inherited, improving the convergence accuracy and efficiency of the improved particle swarm algorithm, and outputting better parameters for the fault diagnosis model.

[0043] Optionally, in one embodiment of this application, determining whether a preset particle swarm optimization calculation operation meets a preset iteration termination number requirement and a convergence requirement, wherein, if the preset iteration termination number requirement and convergence requirement are met, determining the target fault diagnosis model parameters based on the updated global optimization position and global fitness optimization value, includes: determining whether the particle swarm has been traversed, wherein, when the particle swarm has been traversed, in response to the preset particle swarm optimization calculation operation meeting the preset convergence requirement, determining whether the preset particle swarm optimization calculation operation has reached the preset iteration termination number; in response to the preset particle swarm optimization calculation operation reaching the preset iteration termination number, using the updated global optimization position and global fitness optimization value as the target global effective solution corresponding to the particle swarm, and determining the target fault diagnosis model parameters based on the target global effective solution.

[0044] In actual execution, the embodiments of this application can determine whether the particle swarm has been traversed. When the particle swarm has been traversed, the first step is to verify whether the preset particle swarm optimization calculation operation meets the dual convergence requirements. The first of the dual convergence requirements is the fitness convergence requirement, that is, the fluctuation range of the global optimal fitness value in multiple consecutive iterations is less than a preset threshold. The second is the particle distribution convergence requirement, that is, the overall dispersion of the particle swarm position vector is lower than a preset threshold, so as to ensure that the algorithm converges to a stable optimal solution rather than a local optimum.

[0045] Under the premise of satisfying the dual convergence requirement, the embodiments of this application can further determine whether the preset particle swarm optimization calculation operation has reached the preset iteration termination number; if the iteration termination number has been reached, the updated global optimization position and global fitness optimization value are used as the target global effective solution corresponding to the particle swarm, and the target fault diagnosis model parameters of the BMC node of the adaptive server cluster are obtained according to the target global effective solution; if the iteration termination number has not been reached, the next round of particle swarm iterative optimization process is continued.

[0046] Therefore, by performing dual convergence requirements and dual verification of the number of iterations, the embodiments of this application avoid the algorithm from getting stuck in local optima or converging too early, ensuring that the output global effective solution has both stability and optimality, and providing accurate and reliable parameter configuration for server fault diagnosis models.

[0047] Optionally, in one embodiment of this application, updating the current position vector and current velocity vector of the current particle includes: determining the current velocity and current position of the current particle in different dimensions based on the current velocity vector, and calculating the product of the particle swarm inertia weight and the velocity weight corresponding to the current velocity; determining the individual learning factor and global learning factor corresponding to the preset particle swarm optimization calculation operation according to the current velocity vector, and obtaining the individual convergence random number corresponding to the individual learning factor and the global convergence random number corresponding to the global learning factor; calculating the first position difference between the individual historical optimization position and the current position of the current particle, and calculating the individual position factor product between the first position difference, the individual learning factor and the individual convergence random number; calculating the second position difference between the global optimization position and the current position of the particle swarm, and calculating the global position factor product between the second position difference, the global learning factor and the global convergence random number; calculating the new velocity of the current particle in different dimensions based on the velocity weight product, the individual position factor product and the global position factor product, and updating the current velocity vector of the current particle using the new velocity; calculating the position velocity sum between the current position and the new velocity, and updating the current position vector of the current particle using the position velocity sum.

[0048] In the specific implementation process, the velocity vector of the current particle can be updated by the following formula: , in, v ij ( t ) represents particles i In the j The current speed of the dimension, the step size and direction of parameter adjustment (such as CPU quota increase / decrease rate). w The particle swarm inertia weights (controlling the influence of historical velocities) represent the global exploration capability (high) of the balancing algorithm. w When the value is high, the particle inertia is large, making it more suitable for large-scale searches, while its local exploitation capability is low. w When the value is low, the particle inertia is small, and it tends to perform a more refined search within the current area. c 1. c 2 represents the individual learning factor and the global learning factor (which respectively adjust the weights of individual and overall experience), and is usually set as follows: c 1= c 2 = 1.5; r 1.r 2 represents the individual convergent random number and the globally convergent random number within the interval [0,1]. Random data is introduced to avoid premature convergence. p ij Represents particles i In the j The historical optimal position of the dimension (i.e., the individual historical optimal position). g j Indicates the particle swarm in the th j The global optimal position of the dimension (i.e., the global optimization position); x ij ( t ) represents particles i In the j The current position of the dimension.

[0049] Furthermore, after updating the velocity vector of the current particle, the position vector of the current particle can be updated using the following formula: x ij ( t + 1) = x ij ( t ) + v ij ( t + 1), in, x ij ( t ) represents particles i In the j The current position of the dimension; v ij ( t + 1) indicates the updated speed; x ij ( t +1) represents the updated parameter value of the next generation (i.e., the new position of the current particle).

[0050] Therefore, the embodiments of this application accurately update the particle velocity and position vectors through multi-parameter collaborative calculation such as inertia weight and learning factor, balance the individual particle experience and the optimal information of the group, and introduce random numbers to avoid premature convergence, thereby improving the algorithm exploration and development capabilities and helping to efficiently obtain better fault diagnosis model parameters.

[0051] Optionally, in one embodiment of this application, calculating the current fitness value of the current particle in the particle swarm based on controller operating data includes: determining the total operating power consumption and maximum power consumption of the target server cluster based on configuration parameters, and calculating the power consumption normalized data corresponding to the target server cluster based on the total operating power consumption and maximum power consumption; determining the CPU utilization of multiple servers in the target server cluster, and calculating the average utilization of the CPU utilization of multiple servers, so as to determine the utilization standard deviation corresponding to the target server cluster based on the average utilization; calculating the diagnostic misjudgment rate of the fault diagnosis model based on controller operating data and configuration parameters, and determining the constraint penalty of the target server cluster; constructing a corresponding fitness function based on the power consumption normalized data, utilization standard deviation, constraint penalty and diagnostic misjudgment rate, so as to calculate the current fitness value of the current particle in the particle swarm through the fitness function.

[0052] It should be noted that the embodiments of this application may be based on power consumption normalization parameters ( P norm ), diagnostic misjudgment rate (1- A ), load balancing parameters ( σ u A suitable fitness function is constructed using the constraint penalty parameter (Penalty). This fitness function, combined with controller runtime data, is used to calculate the current fitness value of the current particle in the particle swarm. The fitness function is shown in the following equation: f ( x i ) = α · P norm + β · (1- A ) + γ · σ u + λ ·Penalty, in, x i Indicates the first i Position vectors of individual particles (parameter configuration); α , β , γ Indicates the weighting coefficient ( α = 0.4, β =0.3, γ =0.3); λ Indicates the penalty coefficient (when a hard constraint is violated). λ = 100, otherwise λ = 0); P normThis represents the power consumption normalization parameter, used to evaluate the ratio of total power consumption to the maximum allowable power consumption of the system under the current parameter configuration; (1- A The ) represents the diagnostic misdiagnosis rate (1 - accuracy rate); σ u The load balancing parameter is the standard deviation of CPU utilization across all server BMCs; Penalty is the constraint penalty parameter, used to handle situations where parameter configurations violate system constraints.

[0053] In the embodiments of this application, the power consumption normalization parameter P norm Load balancing parameters σ u The mathematical expression for the constraint penalty parameter Penalty is:

[0054]

[0055]

[0056]

[0057] in, P total This indicates that all server BMCs are configured. x i The total power consumption under these conditions; P max This indicates the maximum power consumption that the data center allocates to the server BMC cluster; N Indicates the total number of servers; Indicates the first n CPU utilization of each BMC (e.g., 30% is represented as 0.3); This represents the average CPU utilization of all server BMCs; V k Indicates the first k The amount of a constraint violation (the portion exceeding the constraint), for example, the total CPU quota constraint. V cpu Single-node memory constraints V mem Temperature safety constraints V temp .

[0058] Therefore, embodiments of this application construct a composite fitness function that integrates power consumption, diagnostic accuracy, and load balancing by using power consumption normalization parameters, diagnostic misjudgment rate, load balancing parameters, and constraint penalty parameters. This function serves as the core evaluation mechanism for improving particle swarm optimization parameters. It leverages real-time monitoring data to evaluate node states online, providing a quantitative evaluation standard for the parameter combinations of each particle to guide the particle swarm towards a globally better direction. Furthermore, embodiments of this application employ a resource constraint processing strategy. During the parameter update phase, based on load balancing factors and a dynamic penalty mechanism, it precisely controls the resource allocation for parallel diagnosis of multiple BMC nodes, significantly reducing system resource contention conflicts and achieving multi-objective optimization through weight adjustment.

[0059] Optionally, in one embodiment of this application, calculating the diagnostic misjudgment rate of the fault diagnosis model based on controller operating data and configuration parameters includes: injecting multiple faults of different types into the fault diagnosis model, and performing fault diagnosis on the multiple faults through the fault diagnosis model to obtain corresponding fault diagnosis results; determining at least one target fault diagnosis result that meets the preset accurate fault diagnosis requirements, and counting the number of faults in at least one target fault diagnosis result; calculating the diagnostic accuracy rate corresponding to the fault diagnosis model based on the number of faults, and calculating the diagnostic misjudgment rate corresponding to the fault diagnosis model based on the diagnostic accuracy rate.

[0060] As one possible approach, embodiments of this application can inject data covering server hardware, software, and operational processes into the fault diagnosis model during the testing period. N faults The system can diagnose various types of faults, including but not limited to hardware anomalies, resource overloads, and data transmission failures, using a fault diagnosis model to perform a full-process diagnosis of each type of fault and obtain the corresponding fault diagnosis results.

[0061] Secondly, the embodiments of this application can set hierarchical preset fault accurate diagnosis and judgment rules to filter at least one target fault diagnosis result that meets the judgment rules according to the priority and impact range of the fault type, and count the number of faults of the target fault diagnosis results.

[0062] Subsequently, embodiments of this application can calculate the diagnostic accuracy of the fault diagnosis model by combining the total number of injected faults. A (Take 1 if the diagnosis is correct, otherwise take 0), and calculate the corresponding diagnostic misjudgment rate (1-) based on the diagnostic accuracy rate. A The misclassification rate is used as the core evaluation term of the particle swarm optimization algorithm's fitness function to guide the iterative optimization of fault diagnosis model parameters.

[0063] Therefore, the embodiments of this application inject multiple types of faults and perform hierarchical judgment to accurately quantify the diagnostic accuracy and misjudgment rate, thereby providing a reliable evaluation basis for algorithm optimization, effectively driving the parameter iteration of the fault diagnosis model, improving the model's recognition accuracy for various faults, and reducing the risk of misjudgment.

[0064] In step S102, the baseboard management controller with the fault diagnosis model deployed in the target server cluster is run using the parameters of the target fault diagnosis model to collect the running data of the baseboard management controller during the target time period, and the performance of at least some of the target servers in the target server cluster is evaluated based on the running data to obtain the performance evaluation index change values ​​of at least some of the target servers.

[0065] In step S103, in response to the change value of the performance evaluation index being greater than the preset change threshold, based on the particle swarm inertia weight and configuration parameters, the preset particle swarm optimization calculation operation is iteratively executed and the performance evaluation operation is performed until the change value of the performance evaluation index obtained during the iteration process is less than or equal to the preset change threshold, so as to generate the final performance evaluation index change value corresponding to at least some target servers, so as to obtain the performance evaluation result corresponding to at least some target servers based on the final performance evaluation index change value.

[0066] Furthermore, in the parameter deployment phase, embodiments of this application can load the target fault diagnosis model parameters into the fault diagnosis model of each baseboard management controller node in the server cluster and run it, and continuously collect the operating data of the baseboard management controller within a set time period. Based on this data, performance evaluation is carried out on multiple target servers in the cluster to output dynamic change data of performance evaluation indicators.

[0067] When the change value of the performance evaluation index exceeds the preset change threshold (i.e., the server performance is not in a relatively stable state), the particle swarm optimization calculation process and performance evaluation process can be restarted based on the particle swarm inertia weight and server configuration parameters until the change value of the performance evaluation index is less than or equal to the preset change threshold (i.e., the final change value of the performance evaluation index) or the change value of the performance evaluation index obtained within the target time period is not greater than the preset change threshold (i.e., the server performance is in a relatively stable state). Based on the final change value of the performance evaluation index, the performance evaluation results corresponding to multiple servers in the server cluster that are in a stable performance state can be obtained, and fault diagnosis processing can be performed based on the performance evaluation results. The performance evaluation results include server basic identification data, core performance index steady-state data (i.e., CPU utilization, memory usage, model inference latency, fault identification accuracy, and other hardware indicators related to the operation of the fault diagnosis model), and iterative optimization process data, etc.

[0068] Therefore, the embodiments of this application monitor the server fault diagnosis effect in real time through data collection and performance evaluation after deployment parameters, and trigger re-optimization when the indicators are abnormal, forming a closed-loop optimization mechanism, thereby ensuring that the fault diagnosis model always adapts to the server operating status and improving the stability and reliability of diagnosis.

[0069] Optionally, in one embodiment of this application, the baseboard management controller deployed with the fault diagnosis model in the target server cluster is run using the target fault diagnosis model parameters to collect the operating data of the baseboard management controller during a target time period, and the performance of at least some target servers in the target server cluster is evaluated based on the operating data to obtain the performance evaluation index change values ​​of at least some target servers. This includes: sending the target fault diagnosis model parameters to the corresponding baseboard management controller so that the baseboard management controller runs the corresponding fault diagnosis model according to the target fault diagnosis model parameters and collects the operating data of the baseboard management controller during the target time period; performing preset performance evaluation operations on multiple target servers in the target server cluster based on the operating data to obtain the minimum and maximum performance evaluation index values ​​of multiple target servers during the target time period, and calculating the performance evaluation index change values ​​of multiple target servers during the target time period based on the minimum and maximum performance evaluation index values.

[0070] In actual implementation, the process of calculating the change value of the performance evaluation index of the target server within the target time period in this embodiment of the application is as follows: 1. Deploy the optimal parameter configuration (i.e., the target fault diagnosis model parameters) to the data center BMC production environment; 2. Activate the long-term monitoring system to continuously collect BMC operation data (i.e., collect the operation data of the baseboard management controller within the target time period). 3. Obtain the minimum and maximum performance evaluation metrics of multiple target servers within a target time period (e.g., daily) to calculate the change in performance evaluation metrics of multiple target servers within the target time period.

[0071] Subsequently, the embodiments of this application can determine whether the performance degradation exceeds 10% (i.e., the preset change threshold). If it does, a re-optimization process is triggered to regenerate the particle swarm; otherwise, BMC running data is re-collected.

[0072] Therefore, the embodiments of this application deploy the optimized model parameters to the BMC node and collect the corresponding running data, and combine the performance index extreme value calculation change data to accurately quantify the performance fluctuation of server fault diagnosis, providing an objective basis for the iterative optimization of model parameters and ensuring the long-term stable and efficient operation of the diagnostic system.

[0073] Optionally, in one embodiment of this application, the process of setting the preset change threshold includes: collecting service operation-related indicators of the server cluster, classifying service pressure levels, and outputting service pressure level labels; extracting historical load fluctuation characteristics corresponding to the service pressure level labels, calculating fluctuation coefficients, and outputting the fluctuation coefficients; matching the benchmark thresholds corresponding to the service pressure level labels, correcting the benchmark thresholds in combination with the fluctuation coefficients, and outputting a performance degradation judgment threshold, which serves as the basis for performance degradation judgment; repeating the above process according to a preset period and updating the performance degradation judgment threshold, which serves as the basis for the next round of performance degradation judgment.

[0074] Specifically, the process of dynamically setting the preset change threshold in this application embodiment is as follows: Step 1: Collect and classify multi-dimensional business stress indicators: (1) Indicator collection: The system collects three core operational metrics for the server cluster: computing resource metrics (CPU utilization, memory usage), network transmission metrics (network throughput, concurrent connections), and service response metrics (message processing latency, request response time). It also collects raw metric data at millisecond-level frequencies via the network card driver interface and the cluster monitoring system, eliminating the interference of instantaneous peak values ​​on the metric classification. (2) Grading rules: An unsupervised clustering algorithm was used to normalize the collected indicator data and divide it into three business pressure levels: light load, medium load, and heavy load. The clustering process aimed to minimize the fluctuation of indicators within the same level and maximize the difference between indicators across levels. (3) Output and application: Output a business stress level label (light load / medium load / heavy load). This label serves as the input for step 2 and is used to filter historical load data samples under the corresponding stress level. Step 2: Extraction of historical load fluctuation characteristics and calculation of fluctuation coefficients by level: (1) Historical data filtering: Based on the business stress level label output in step 1, extract the receiving queue performance degradation rate data of the same level within the most recent statistical period from the cluster's historical operation database to form the load fluctuation dataset of the corresponding level. (2) Fluctuation feature extraction: Calculate the standard deviation and coefficient of variation of the load fluctuation dataset. The standard deviation reflects the dispersion of the performance degradation rate, and the coefficient of variation eliminates the influence of data dimensions. Together, they characterize the fluctuation features of historical load.

[0075] (3) Calculation of volatility coefficient: A volatility coefficient calculation model is constructed, in which the volatility coefficient is negatively correlated with the standard deviation of the dataset, that is, the greater the historical load fluctuation, the smaller the volatility coefficient, thereby reflecting the adjustment weight of the threshold. (4) Output and application: Output the fluctuation coefficient, which is used as input to step 3 to dynamically correct the baseline threshold. Step 3: Baseline threshold matching and dynamic threshold generation: (1) Preset benchmark threshold: For the three service pressure levels of light load, medium load, and heavy load, corresponding baseline threshold ranges are preset; the initial value of the baseline threshold is determined based on the performance limit test results when the cluster is running at full load, ensuring that the baseline threshold of each level matches the service carrying capacity.

[0076] (2) Dynamic threshold correction: Multiply the fluctuation coefficient output in step 2 by the corresponding baseline threshold to obtain the final performance degradation judgment threshold; if the fluctuation coefficient is small (large load fluctuation), the threshold is reduced; if the fluctuation coefficient is large (stable load), the threshold remains at the baseline level. (3) Output and application: Output the performance degradation judgment threshold. This threshold is directly used as the core basis for the performance degradation judgment step to determine whether the current performance degradation exceeds the acceptable range.

[0077] Step 4: Periodic Iterative Optimization and Threshold Update: (1) Update cycle setting: Set a threshold update cycle, the length of which is related to the level of business pressure. The cycle is longer in light load scenarios and shorter in heavy load scenarios to ensure that the threshold can quickly adapt to business changes under heavy load.

[0078] (2) Iterative optimization logic: Each time the update cycle is reached, steps 1 to 3 can be repeated to generate a new performance degradation judgment threshold. If the fluctuation range of the threshold generated multiple times is less than the preset fluctuation threshold, the threshold is temporarily locked to reduce invalid calculation overhead until the business pressure level changes.

[0079] (3) Output and application: The updated performance degradation judgment threshold is output as the basis for the next round of performance degradation judgment, forming a closed loop of dynamic threshold optimization. Therefore, the embodiments of this application dynamically generate performance degradation thresholds through operations such as business pressure level classification and historical fluctuation feature modeling, so as to adapt to load changes under different business scenarios, avoid misjudgment or missed judgment caused by fixed thresholds, and combine periodic iterative optimization to form a closed loop of dynamic threshold adjustment, thereby improving the accuracy of performance degradation judgment and ensuring the stable operation of the network card RSS (Receive Side Scaling) load balancing system.

[0080] Furthermore, embodiments of this application can also analyze and judge the accuracy of the obtained performance evaluation results, the specific process of which is as follows: (1) Obtain the original running data and evaluation process parameters corresponding to the server cluster performance evaluation results, and perform outlier removal and normalization preprocessing on the original running data, and perform correlation verification on the evaluation process parameters to output the data-parameter correlation set. (2) Based on the data-parameter association set, a two-dimensional benchmark verification model of "hardware operation baseline model and evaluation process logic verification model" is constructed. The hardware operation baseline model fits the performance boundary of historical steady-state data, and the evaluation process logic verification model verifies the rationality of the mapping between parameter input and result output, and outputs the benchmark verification model. (3) Input the performance evaluation results to be verified into the benchmark verification model, calculate the performance index deviation through the hardware operation baseline model and the process mapping deviation through the evaluation process logic verification model, and then weight and fuse them to obtain the comprehensive verification deviation value. (4) Establish dynamic deviation judgment rules that are linked to the server business pressure level, compare the verification deviation value with the dynamic deviation judgment rules, output the judgment conclusion on the accuracy of the performance evaluation result, and feed the judgment conclusion back to the benchmark verification model to iteratively optimize the model parameters.

[0081] The aforementioned raw operational data refers to unprocessed data directly collected from the hardware, software, and business operation processes of the server cluster. It serves as the foundation for performance evaluation and mainly includes: hardware-level operational data (such as computing resource data, network resource data, and management node data), software-level operational data (such as model operational data and system process data (such as memory consumption and thread scheduling frequency)), and business-level operational data (such as business response data (such as service request processing success rate, average response time, and business throughput)) and performance fluctuation data (such as fluctuation amplitude and steady-state duration)). Evaluation process parameters refer to the rule-based and computational parameters configured and generated throughout the performance evaluation process, such as relevant thresholds, server configuration parameters, network structure parameters of the fault diagnosis model, and weight coefficients.

[0082] Therefore, the embodiments of this application achieve accurate determination of the performance evaluation results through a two-dimensional benchmark verification model and dynamic deviation judgment rules. At the same time, the verification capability is continuously optimized by relying on the feedback iteration mechanism, which effectively ensures the reliability of server cluster performance evaluation.

[0083] The execution logic of the server performance evaluation method of this application will be explained below with reference to the accompanying drawings.

[0084] Figure 2 This is a schematic diagram illustrating the execution logic of the server performance evaluation method described in this application. Figure 2 As shown, the execution process of the server performance evaluation method of this application is as follows: S201: System initialization, loading the adaptive neural fuzzy inference system model; S202: Generate the initial particle swarm; S203: Execute the improved particle swarm algorithm process; S204: Determine whether the particle swarm optimization was successful. If successful, proceed to S205; otherwise, proceed to S202. S205: Obtain and deploy optimal parameter configurations for the baseboard management controller production environment in the data center; S206: Start long-term monitoring and continuously collect operating data from the baseboard management controller; S207: Based on the operating data of the baseboard management controller, perform performance evaluation on the servers in the server cluster to obtain data on changes in performance evaluation indicators; S208: Determine whether the change in the performance evaluation index data is greater than the preset change threshold. If it is, proceed to S209; otherwise, proceed to S206. S209: Trigger re-optimization operation and proceed to S202.

[0085] The improved particle swarm optimization algorithm of this application is described below with reference to the accompanying drawings.

[0086] Figure 3 This is a schematic diagram illustrating the execution flow of the improved particle swarm optimization algorithm of this application. Figure 3 As shown, the execution flow of the improved particle swarm optimization algorithm of this application is as follows: S301: Set the initial value of the iteration counter to zero; S302: Determine if the current iteration count is less than the maximum iteration count. If it is less, proceed to step S303; otherwise, go to S3020. S303: Calculate particle swarm diversity; S304: Introduce an adaptive neurofuzzy inference system model to dynamically calculate particle swarm inertia weights; S305: Begin iterating through each particle in the particle swarm; S306: Decode the particle position vector into specific substrate management controller parameter configurations; S307: Deploy the decoded parameters to the baseboard management controller of the server cluster and run diagnostic tests; S308: Collects operating data of the baseboard management controller, including indicators such as power consumption, diagnostic accuracy, and utilization rate; S309: Calculate the current fitness value of the current particle based on the collected data from the substrate management controller.

[0087] S3010: Determine whether the current fitness value is better than the current particle's historical best fitness value. If it is better, go to S3011; otherwise, go to S3012. S3011: Update the individual historical best position and historical best fitness value of the particle, and then go to S3013; S3012: Preserve the individual historical best position of the particle; S3013: Determine whether the current fitness value is better than the global optimum. If it is, go to S3014; otherwise, go to S3015. S3014: Update the global optimal position and global optimal fitness value, and go to S3016; S3015: Maintain the globally optimal position unchanged; S3016: Update the particle's velocity and position vectors; S3017: Determine whether all particles have been processed. If yes, proceed to S3018; otherwise, proceed to S305. S3018: Determine whether the algorithm meets the convergence requirements. If it does, proceed to S3019; otherwise, proceed to S3020. S3019: The iteration counter increments by one and then proceeds to S302; S3020: Returns the global optimal solution as the optimization result, and determines the parameters of the target fault diagnosis model based on the optimization result.

[0088] It should be noted that in actual implementation, this application is applicable to server fault diagnosis and performance evaluation, but is not limited to server nodes. It is also of reference value for nodes such as computers, switches, and industrial control computers that have BMC.

[0089] Therefore, this application treats each BMC node as an independent optimization unit, optimizing single-node diagnostic parameters through a local objective function. Simultaneously, it introduces an improved particle swarm optimization algorithm integrated with ANFIS, achieving collaborative optimization of multiple BMC nodes through global resource monitoring. Specifically, a load balancing factor quantifies the resource competition relationship between nodes, and based on ANFIS inference, dynamically optimizes particle swarm algorithm parameters. Dynamic penalties control system resource overload, ultimately forming a hybrid optimization framework that balances local diagnostic accuracy and global resource efficiency. Furthermore, this application automatically optimizes search strategies and target weights through real-time status analysis and parameter adjustment mechanisms supported by ANFIS, effectively improving diagnostic stability and accuracy in heterogeneous hardware environments and providing more reliable intelligent operation and maintenance guarantees for ultra-large-scale data centers.

[0090] Furthermore, this application can also construct a corresponding server performance evaluation system based on the aforementioned server performance evaluation method.

[0091] Figure 4 This is a schematic diagram of the logical architecture of the server performance evaluation system of this application. Figure 4 As shown, the server performance evaluation system of this application mainly includes a central optimization controller, a model sharing repository, a server cluster, and a BMC node (i.e., a baseboard management controller node).

[0092] Among them, the central optimization controller, as the core of the system, is responsible for formulating optimization strategies, dynamically adjusting and distributing fault diagnosis parameter configurations of each BMC through an improved particle swarm algorithm, and managing models in the model sharing warehouse to achieve a global balance optimization of resources, power consumption and diagnostic efficiency.

[0093] Model sharing repository: Centrally stores various fault diagnosis models, supports version management, and distributes models to BMC nodes as needed.

[0094] Server cluster: Composed of multiple servers, it executes actual business operations and collects and feeds back BMC operation data for optimization and analysis.

[0095] BMC node: Deploys a lightweight fault diagnosis model to monitor hardware status in real time and apply optimization parameters.

[0096] Figure 5 A schematic diagram of the logical architecture of the central optimization controller. (For example...) Figure 5 As shown, the central optimization controller of this application mainly includes an optimization strategy manager, a particle swarm optimizer, an ANFIS intelligent regulator (i.e., an adaptive neurofuzzy inference system intelligent regulator), and a performance monitor.

[0097] Among them, the optimization strategy manager sets global optimization goals and coordinates the workflow of each module.

[0098] Particle Swarm Optimizer: Executes the particle swarm algorithm to optimize the parameters of the fault diagnosis model for BMC nodes.

[0099] ANFIS Smart Regulator: Dynamically adjusts particle swarm inertia weights through fuzzy inference. w This enhances the adaptive capability of the particle swarm optimization algorithm.

[0100] Performance Monitor: Collects particle swarm diversity metrics and BMC node operating data in real time, thus providing data input for the ANFIS intelligent regulator.

[0101] Therefore, this application embodiment constructs a centralized system architecture, using a central controller to coordinate global strategies, deploying lightweight diagnostic models on each BMC node, and dynamically optimizing BMC parameter configuration through an improved particle swarm optimization algorithm to achieve synergistic optimization of resource utilization, power consumption control, and diagnostic accuracy. Secondly, this application embodiment designs an improved dynamic parameter optimization method for the particle swarm optimization algorithm based on a particle swarm optimizer and an ANFIS intelligent regulator, and achieves real-time optimization of algorithm parameters through ANFIS intelligent inference. Compared with traditional static configuration methods, this application embodiment can reduce the total power consumption of the BMC cluster and reduce resource contention conflicts while improving diagnostic accuracy, significantly improving the overall performance and reliability of the large-scale data center fault diagnosis system.

[0102] Through the above description of the embodiments, those skilled in the art can clearly understand that the methods according to the above embodiments can be implemented by means of software plus necessary general-purpose hardware platforms. Of course, they can also be implemented by hardware, but in many cases the former is a better implementation method.

[0103] Embodiments of this application also provide a server performance evaluation apparatus.

[0104] like Figure 6 As shown, the server performance evaluation device 10 includes: a model parameter acquisition module 100, a performance evaluation module 200, and an iterative optimization module 300.

[0105] The model parameter acquisition module 100 is used to acquire the configuration parameters of the target server cluster and generate corresponding particle swarm inertia weights based on the pre-trained neural fuzzy inference model. Based on the particle swarm inertia weights and configuration parameters, iteratively executes preset particle swarm optimization calculation operations to obtain target fault diagnosis model parameters that meet the preset effective solution requirements.

[0106] The performance evaluation module 200 is used to run the baseboard management controller with the fault diagnosis model deployed in the target server cluster using the parameters of the target fault diagnosis model, so as to collect the running data of the baseboard management controller in the target time period, and perform performance evaluation on at least some of the target servers in the target server cluster based on the running data, so as to obtain the performance evaluation index change values ​​of at least some of the target servers.

[0107] The iterative optimization module 300 is used to respond to a change in the performance evaluation index value that is greater than a preset change threshold. Based on the particle swarm inertia weight and configuration parameters, iteratively executes a preset particle swarm optimization calculation operation and performs a performance evaluation operation until the change in the performance evaluation index value obtained during the iteration process is less than or equal to the preset change threshold. It generates the final performance evaluation index change value corresponding to at least some target servers, so as to obtain the performance evaluation result corresponding to at least some target servers based on the final performance evaluation index change value.

[0108] Optionally, in one embodiment of this application, the model parameter acquisition module 100 includes: a determination unit, an acquisition unit, a conversion unit, and a deblurring unit.

[0109] The determining unit is used to acquire the historical running data of the target server cluster and the particle swarm optimization log corresponding to the preset particle swarm algorithm, and to train the neurofuzzy inference model based on the historical running data and the particle swarm optimization log, and to determine the input feature matching rules of the neurofuzzy inference model.

[0110] The acquisition unit is used to acquire particle swarm diversity features corresponding to the particle swarm algorithm and node operating parameters of the baseboard management controller based on the input feature matching rules. The particle swarm diversity features include particle dispersion and particle aggregation, and the node operating parameters include hardware status data and resource usage data.

[0111] The transformation unit is used to input the particle swarm diversity features and node operation parameters into the trained neural fuzzy inference model, so as to perform preset fuzzification transformation and rule-based inference operations on the particle swarm diversity features and node operation parameters to obtain the corresponding inference results.

[0112] The deblurring unit is used to perform pre-defined deblurring processing on the inference results in order to generate particle swarm inertia weights that meet the pre-defined adaptation requirements.

[0113] Optionally, in one embodiment of this application, the model parameter acquisition module 100 further includes: a traversal unit, a decoding unit, an acquisition unit, an update unit, and a judgment unit.

[0114] The traversal unit is used to traverse a preset particle swarm based on the particle swarm inertia weight and configuration parameters to obtain the current position vector and current velocity vector of the particles in the particle swarm.

[0115] The decoding unit is used to decode the current position vector to obtain the corresponding position decoding data, and send the position decoding data to the fault diagnosis model of the baseboard management controller in the target server cluster. It also runs the baseboard management controller according to the position decoding data to collect the corresponding controller operation data, which includes operating power consumption, diagnostic accuracy and diagnostic utilization.

[0116] The acquisition unit is used to calculate the current fitness value of the current particle in the particle swarm based on the controller's operating data, and to acquire the global optimization position and global fitness optimization value of the particle swarm that meet the preset optimization requirements.

[0117] The update unit is used to update the global optimal position and global fitness value of the particle swarm in response to the current fitness value being greater than the global fitness optimization value, and to update the current position vector and current velocity vector of the current particle.

[0118] The judgment unit is used to determine whether the preset particle swarm optimization calculation operation meets the preset iteration termination number requirement and convergence requirement. If the preset iteration termination number requirement and convergence requirement are met, the target fault diagnosis model parameters are determined based on the updated global optimization position and global fitness optimization value. Otherwise, the preset particle swarm optimization calculation operation is performed again on the particle swarm using the updated current position vector and current velocity vector until the preset iteration termination number requirement and convergence requirement are met, so as to obtain the corresponding target fault diagnosis model parameters.

[0119] Optionally, in one embodiment of this application, the performance evaluation module 200 includes: a running unit and a computing unit.

[0120] The running unit is used to send the target fault diagnosis model parameters to the corresponding baseboard management controller, so that the baseboard management controller runs the corresponding fault diagnosis model according to the target fault diagnosis model parameters and collects the running data of the baseboard management controller within the target time period.

[0121] The calculation unit is used to perform preset performance evaluation operations on multiple target servers in the target server cluster based on the running data, so as to obtain the minimum and maximum performance evaluation indicators of the multiple target servers within the target time period, and calculate the change value of the performance evaluation indicators of the multiple target servers within the target time period based on the minimum and maximum performance evaluation indicators.

[0122] Optionally, in one embodiment of this application, the update unit includes an analysis subunit and a response subunit.

[0123] The analysis subunit is used to obtain the historical fitness optimization value of the current particle and the global fitness optimization value of the particle swarm that meet the preset optimization requirements. In response to the current fitness value being greater than the historical fitness optimization value, it determines the current optimization position corresponding to the current fitness value. Based on the current fitness value and the current optimization position, it updates the individual historical optimization position and historical fitness optimization value of the current particle, and determines whether the updated historical fitness optimization value is greater than the global fitness optimization value.

[0124] The response subunit is used to update the global optimization position and global fitness value of the particle swarm based on the updated individual historical optimization position and historical fitness value when the updated historical fitness value is greater than the global fitness value.

[0125] Optionally, in one embodiment of this application, the judgment unit includes: an iterative analysis subunit and an optimization subunit.

[0126] The iterative analysis subunit is used to determine whether the particle swarm has been traversed. When the particle swarm has been traversed, in response to the preset particle swarm optimization calculation operation meeting the preset convergence requirements, it is determined whether the preset particle swarm optimization calculation operation has reached the preset number of iterations to terminate.

[0127] The optimization sub-unit is used to respond to the preset particle swarm optimization calculation operation reaching the preset number of iterations termination by taking the updated global optimization position and global fitness optimization value as the target global effective solution corresponding to the particle swarm, and determining the target fault diagnosis model parameters based on the target global effective solution.

[0128] Optionally, in one embodiment of this application, the updating unit further includes: a first operation subunit, a factor determination subunit, a second operation subunit, a third operation subunit, a fourth operation subunit, and a fifth operation subunit.

[0129] The first operation subunit is used to determine the current velocity and current position of the current particle in different dimensions based on the current velocity vector, and to calculate the product of the particle swarm inertia weight and the velocity weight corresponding to the current velocity.

[0130] The factor determination subunit is used to determine the individual learning factor and global learning factor corresponding to the preset particle swarm optimization calculation operation based on the current velocity vector, and to obtain the individual convergence random number corresponding to the individual learning factor and the global convergence random number corresponding to the global learning factor.

[0131] The second operation subunit is used to calculate the first position difference between the individual historical optimization position and the current position corresponding to the current particle, and to calculate the product of the first position difference, the individual learning factor and the individual convergence random number, and the individual position factor.

[0132] The third operation subunit is used to calculate the second position difference between the global optimization position corresponding to the particle swarm and the current position, and to calculate the global position factor product between the second position difference, the global learning factor and the global convergence random number.

[0133] The fourth operation subunit is used to calculate the new velocity of the current particle in different dimensions based on the velocity weight product, the individual position factor product, and the global position factor product, and to update the current velocity vector of the current particle using the new velocity.

[0134] The fifth operation subunit is used to calculate the position-velocity sum between the current position and the new velocity, and to update the current position vector of the current particle using the position-velocity sum.

[0135] Optionally, in one embodiment of this application, the acquisition unit includes: a sixth operation subunit, a mean calculation subunit, a false positive rate calculation subunit, and a construction subunit.

[0136] The sixth operation subunit is used to determine the total operating power consumption and maximum power consumption of the target server cluster based on the configuration parameters, and to calculate the power consumption normalization data corresponding to the target server cluster based on the total operating power consumption and maximum power consumption.

[0137] The mean calculation subunit is used to determine the CPU utilization of multiple servers in the target server cluster and calculate the average utilization of the CPU utilization of multiple servers, so as to determine the standard deviation of the utilization of the target server cluster based on the average utilization.

[0138] The misjudgment rate calculation subunit is used to calculate the diagnostic misjudgment rate of the fault diagnosis model based on the controller's operating data and configuration parameters, and to determine the constraint penalty of the target server cluster.

[0139] A sub-unit is constructed to build a corresponding fitness function based on power consumption normalization data, utilization standard deviation, constraint penalty and diagnostic misjudgment rate, so as to calculate the current fitness value of the current particle in the particle swarm through the fitness function.

[0140] Optionally, in one embodiment of this application, the false positive rate calculation subunit includes: an injection subunit, a statistics subunit, and a false positive rate calculation subunit.

[0141] The injection subunit is used to inject various types of faults into the fault diagnosis model, and to perform fault diagnosis on the various faults through the fault diagnosis model to obtain the corresponding fault diagnosis results.

[0142] The statistics subunit is used to determine at least one target fault diagnosis result that meets the preset requirements for accurate fault diagnosis, and to count the number of faults for at least one target fault diagnosis result.

[0143] The misjudgment rate calculation subunit is used to calculate the diagnostic accuracy rate corresponding to the fault diagnosis model based on the number of faults, and to calculate the diagnostic misjudgment rate corresponding to the fault diagnosis model based on the diagnostic accuracy rate.

[0144] For a description of the features in the embodiment corresponding to the server performance evaluation device, please refer to the relevant description in the embodiment corresponding to the server performance evaluation method, which will not be repeated here.

[0145] Embodiments of this application also provide an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to perform the steps in any of the above-described server performance evaluation method embodiments.

[0146] Embodiments of this application also provide a non-volatile computer-readable storage medium storing a computer program, wherein the computer program is configured to execute the steps in any of the above-described server performance evaluation method embodiments at runtime.

[0147] In one exemplary embodiment, the aforementioned non-volatile computer-readable storage medium may include, but is not limited to, various media capable of storing computer programs, such as USB flash drives, read-only memory (ROM), random access memory (RAM), portable hard drives, magnetic disks, or optical disks.

[0148] Embodiments of this application also provide a computer program product, which includes a computer program that, when executed by a processor, implements the steps in any of the above-described server performance evaluation method embodiments.

[0149] Embodiments of this application also provide another computer program product, including a non-volatile computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps in any of the above-described server performance evaluation method embodiments.

[0150] Those skilled in the art will further recognize that the units and algorithm steps of the various examples described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, the components and steps of the various examples have been generally described in terms of functionality in the foregoing description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

[0151] The foregoing has provided a detailed description of a server performance evaluation method, apparatus, device, and medium provided in this application. Specific examples have been used to illustrate the principles and implementation methods of this application. The descriptions of the embodiments above are merely for the purpose of helping to understand the method and its core ideas. It should be noted that those skilled in the art can make various improvements and modifications to this application without departing from its principles, and these improvements and modifications also fall within the protection scope of the claims of this application.

Claims

1. A method for evaluating server performance, characterized in that, Includes the following steps: The configuration parameters of the target server cluster are obtained, and the corresponding particle swarm inertia weights are generated according to the pre-trained neural fuzzy inference model. Based on the particle swarm inertia weights and the configuration parameters, the preset particle swarm optimization calculation operation is iteratively executed to obtain the target fault diagnosis model parameters that meet the preset effective solution requirements. The baseboard management controller, which is equipped with the fault diagnosis model, is run in the target server cluster using the parameters of the target fault diagnosis model to collect the running data of the baseboard management controller during the target time period. Based on the running data, the performance of at least some of the target servers in the target server cluster is evaluated to obtain the change values ​​of the performance evaluation indicators of the at least some target servers. In response to the change value of the performance evaluation index being greater than a preset change threshold, based on the particle swarm inertia weight and the configuration parameters, the preset particle swarm optimization calculation operation is iteratively executed and a performance evaluation operation is performed until the change value of the performance evaluation index obtained during the iteration process is less than or equal to the preset change threshold. The final performance evaluation index change value corresponding to the at least part of the target servers is generated so as to obtain the performance evaluation result corresponding to the at least part of the target servers based on the final performance evaluation index change value.

2. The server performance evaluation method according to claim 1, characterized in that, The step of generating corresponding particle swarm inertia weights based on a pre-trained neurofuzzy inference model includes: The historical running data of the target server cluster and the particle swarm optimization log corresponding to the preset particle swarm algorithm are obtained. Based on the historical running data of the server and the particle swarm optimization log, the neural fuzzy inference model is trained, and the input feature matching rules of the neural fuzzy inference model are determined. Based on the input feature matching rules, the particle swarm diversity features corresponding to the particle swarm algorithm and the node operation parameters of the substrate management controller are collected. The particle swarm diversity features include particle dispersion and particle aggregation, and the node operation parameters include hardware status data and resource usage data. The particle swarm diversity features and the node operating parameters are input into the trained neural fuzzy inference model to perform preset fuzzification transformation and rule-based inference operations on the particle swarm diversity features and the node operating parameters to obtain the corresponding inference results. The inference results are subjected to a preset deblurring process to generate particle swarm inertia weights that meet preset adaptation requirements.

3. The server performance evaluation method according to claim 2, characterized in that, The step of iteratively executing a preset particle swarm optimization calculation operation based on the particle swarm inertia weight and the configuration parameters to obtain target fault diagnosis model parameters that meet preset effective solution requirements includes: Based on the particle swarm inertia weight and the configuration parameters, the preset particle swarm is traversed to obtain the current position vector and current velocity vector of the particles in the particle swarm. The current position vector is decoded to obtain the corresponding position decoding data, and the position decoding data is sent to the fault diagnosis model of the baseboard management controller in the target server cluster. The baseboard management controller is run according to the position decoding data to collect the corresponding controller operation data, wherein the controller operation data includes operating power consumption, diagnostic accuracy and diagnostic utilization. The current fitness value of the current particle in the particle swarm is calculated based on the controller's operating data, and the global optimization position and global fitness optimization value of the particle swarm that meet the preset optimization requirements are obtained. In response to the current fitness value being greater than the global fitness optimization value, the global optimization position and the global fitness optimization value of the particle swarm are updated, and the current position vector and the current velocity vector of the current particle are updated; Determine whether the preset particle swarm optimization calculation operation meets the preset iteration termination number requirement and convergence requirement. If the preset iteration termination number requirement and convergence requirement are met, determine the target fault diagnosis model parameters based on the updated global optimization position and global fitness optimization value. Otherwise, use the updated current position vector and current velocity vector to re-perform the preset particle swarm optimization calculation operation until the preset iteration termination number requirement and convergence requirement are met, so as to obtain the corresponding target fault diagnosis model parameters.

4. The server performance evaluation method according to claim 3, characterized in that, The step involves running the baseboard management controller, which is equipped with the fault diagnosis model, in the target server cluster using the parameters of the target fault diagnosis model to collect the operating data of the baseboard management controller during a target time period. Based on the operating data, performance evaluations are performed on at least a portion of the target servers in the target server cluster to obtain changes in the performance evaluation metrics of the at least a portion of the target servers. This includes: The target fault diagnosis model parameters are sent to the corresponding baseboard management controller, so that the baseboard management controller runs the corresponding fault diagnosis model according to the target fault diagnosis model parameters and collects the operating data of the baseboard management controller within the target time period; Based on the operational data, a preset performance evaluation operation is performed on multiple target servers in the target server cluster to obtain the minimum and maximum performance evaluation indicators of the multiple target servers within the target time period, and the change value of the performance evaluation indicators of the multiple target servers within the target time period is calculated based on the minimum and maximum performance evaluation indicators.

5. The server performance evaluation method according to claim 3, characterized in that, The step of updating the global optimization position and the global fitness optimization value of the particle swarm in response to the current fitness value being greater than the global fitness optimization value includes: The system obtains the historical fitness optimization value of the current particle and the global fitness optimization value of the particle swarm that meet the preset optimization requirements. In response to the current fitness value being greater than the historical fitness optimization value, the system determines the current optimization position corresponding to the current fitness value. Based on the current fitness value and the current optimization position, the system updates the individual historical optimization position and the historical fitness optimization value of the current particle, and determines whether the updated historical fitness optimization value is greater than the global fitness optimization value. In response to the updated historical fitness optimization value being greater than the global fitness optimization value, the global optimization position and the global fitness optimization value of the particle swarm are updated based on the updated individual historical optimization position and the historical fitness optimization value.

6. The server performance evaluation method according to claim 5, characterized in that, The step of determining whether the preset particle swarm optimization calculation operation meets the preset iteration termination number requirement and convergence requirement, wherein, if the preset iteration termination number requirement and convergence requirement are met, the target fault diagnosis model parameters are determined based on the updated global optimization position and global fitness optimization value, including: Determine whether the particle swarm has been traversed. When the particle swarm has been traversed, in response to the preset particle swarm optimization calculation operation meeting the preset convergence requirement, determine whether the preset particle swarm optimization calculation operation has reached the preset iteration termination number. In response to the preset particle swarm optimization calculation operation reaching the preset iteration termination number, the updated global optimization position and global fitness optimization value are taken as the target global effective solution corresponding to the particle swarm, and the target fault diagnosis model parameters are determined based on the target global effective solution.

7. The server performance evaluation method according to claim 6, characterized in that, Updating the current position vector and the current velocity vector of the current particle includes: Based on the current velocity vector, determine the current velocity and current position of the current particle in different dimensions, and calculate the product of the particle swarm inertia weight and the velocity weight corresponding to the current velocity; Based on the current velocity vector, determine the individual learning factor and global learning factor corresponding to the preset particle swarm optimization calculation operation, and obtain the individual convergence random number corresponding to the individual learning factor and the global convergence random number corresponding to the global learning factor; Calculate the first position difference between the individual's historical optimization position and the current position corresponding to the current particle, and calculate the individual position factor product between the first position difference, the individual learning factor, and the individual convergence random number; Calculate the second position difference between the global optimization position corresponding to the particle swarm and the current position, and calculate the global position factor product between the second position difference, the global learning factor, and the global convergence random number; Based on the velocity weight product, the individual position factor product, and the global position factor product, the new velocity of the current particle in different dimensions is calculated, and the current velocity vector of the current particle is updated using the new velocity. Calculate the position-velocity sum between the current position and the new velocity, and update the current position vector of the current particle using the position-velocity sum.

8. The server performance evaluation method according to claim 7, characterized in that, The step of calculating the current fitness value of the current particle in the particle swarm based on the controller's operating data includes: Based on the configuration parameters, the total operating power consumption and maximum power consumption of the target server cluster are determined, and the power consumption normalization data corresponding to the target server cluster is calculated based on the total operating power consumption and the maximum power consumption. Determine the CPU utilization of multiple servers in the target server cluster, and calculate the average CPU utilization of the multiple servers, so as to determine the utilization standard deviation of the target server cluster based on the average utilization. The diagnostic misjudgment rate of the fault diagnosis model is calculated based on the controller's operating data and the configuration parameters, and the constraint penalty of the target server cluster is determined. Based on the power consumption normalization data, the utilization standard deviation, the constraint penalty, and the diagnostic misjudgment rate, a corresponding fitness function is constructed to calculate the current fitness value of the current particle in the particle swarm.

9. The server performance evaluation method according to claim 8, characterized in that, The step of calculating the diagnostic misjudgment rate of the fault diagnosis model based on the controller operating data and the configuration parameters includes: Different types of faults are injected into the fault diagnosis model, and the fault diagnosis model is used to diagnose the various faults to obtain the corresponding fault diagnosis results. Determine at least one target fault diagnosis result that meets the preset requirements for accurate fault diagnosis, and count the number of faults in the at least one target fault diagnosis result; The diagnostic accuracy rate of the fault diagnosis model is calculated based on the number of faults, and the diagnostic misjudgment rate of the fault diagnosis model is calculated based on the diagnostic accuracy rate.

10. An electronic device, characterized in that, include: Memory, used to store computer programs; A processor for executing the computer program to implement the steps of the performance evaluation method for the server as described in any one of claims 1 to 9.