Load balancing method and system for a server cluster

CN122195576APending Publication Date: 2026-06-12JUNDE EARTH (BEIJING) TECHNOLOGY CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: JUNDE EARTH (BEIJING) TECHNOLOGY CO LTD
Filing Date: 2026-01-27
Publication Date: 2026-06-12

Smart Images

Figure CN122195576A_ABST

Patent Text Reader

Abstract

The application discloses a load balancing method and system of a server cluster, belongs to the technical field of load balancing, and discloses a passive mode of abandoning traditional scheduling only relying on current instantaneous state, deploying a time sequence prediction model locally on each backend server, predicting the processing time delay of the next scheduling period by using historical and current running state data, enabling the load balancer to foresee the upcoming load pressure of the server, thereby avoiding the server to be overloaded in advance, significantly reducing the timeout rate and average response time of the task request, and introducing a new algorithm to deploy the time sequence prediction model, improving the prediction accuracy, and thereby guaranteeing the decision accuracy.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of load balancing technology, specifically relating to a load balancing method and system for server clusters. Background Technology

[0002] With the rapid development of internet services, the access volume and data volume carried by server clusters are growing exponentially. To ensure high availability and high performance, load balancing technology is particularly important. Traditional load balancing algorithms mainly include round-robin, random, and least-connection methods. While these methods are simple to implement, most are based on static rules or the current real-time state for scheduling, lacking the ability to predict the future state of the system.

[0003] In practical applications, server load often exhibits dynamic, time-varying, and bursty characteristics. For example, a server may have a low number of connections at one moment, but then a complex computing task may cause a surge in CPU utilization or a sharp increase in response latency. Traditional load balancing algorithms cannot detect such future changes, which can easily lead to some servers being overloaded while others are idle in subsequent scheduling cycles, resulting in slower overall service response or even service outages.

[0004] Furthermore, existing prediction-based load balancing methods often employ a single prediction model (such as a simple ARIMA or BP neural network). When dealing with complex and nonlinear server time series data, these methods suffer from problems such as low prediction accuracy, slow convergence speed, and susceptibility to getting trapped in local optima, leading to inaccurate prediction latency and consequently affecting the effectiveness of load balancing decisions. Summary of the Invention

[0005] This invention provides a load balancing method and system for server clusters to solve the technical problems of poor scheduling and decision-making effects in existing technologies.

[0006] On one hand, the present invention provides a load balancing method for a server cluster, comprising: Obtain the running status data of each backend server in the server cluster, and normalize the running status data to obtain normalized running status data. The normalized running status data is input into the local pre-set time series prediction model through the backend server, and the prediction processing delay of the backend server in the next scheduling cycle is output. The predicted processing latency and server resource usage information of each backend server are obtained through the relay server, and the objective function value of each backend server is obtained based on the predicted processing latency and server resource usage information. Based on the objective function value, the task execution weights corresponding to each backend server are obtained, and the load balancing of the tasks to be executed is performed according to the task execution weights.

[0007] Furthermore, the running status data of each backend server in the server cluster is obtained, and the running status data is normalized to obtain normalized running status data, including: For any backend server in the server cluster, collect the CPU utilization, memory utilization, current number of connections, and average response latency at multiple past sampling time points to obtain the backend server's operating status data. The running status data is normalized to obtain normalized running status data.

[0008] Furthermore, the time-series prediction models pre-configured locally on each backend server are all set to Long Short-Term Memory (LSTM) networks.

[0009] Furthermore, before the initial load balancing operation, the following steps are also included: A time series prediction model is constructed using a long short-term memory network through a relay server, and a solution space is constructed based on the network parameters of the long short-term memory network. Multiple different candidate solutions are generated in the solution space to obtain the first target solution. Obtain the fitness corresponding to each first objective solution, and determine the first optimal solution based on the fitness corresponding to the first objective solution; Based on the first optimal solution, the first target solution is subjected to variable-speed optimization search in the solution space to obtain the second target solution; The second objective solution is subjected to a random spiral information search in the solution space to obtain the third objective solution; Obtain the search merit of the third objective solution, and perform a greedy mutation search on the third objective solution based on the search merit to obtain the fourth objective solution; Obtain the total number of training iterations and determine the training progress based on the total number of training iterations; the training progress includes whether training is completed or not. If the training progress is incomplete, then the fourth objective solution will be used as the first objective solution in the next round of training. When the training progress indicates that training is complete, a second optimal solution is obtained based on the fourth objective solution and distributed to each backend server so that each backend server can pre-set a time series prediction model based on the second optimal solution.

[0010] Furthermore, based on the first optimal solution, the first objective solution is subjected to variable-speed optimization search within the solution space to obtain the second objective solution: ; ; In the formula, For the first t The first training session i The first objective solution For the first i A second objective solution i =1,2,…,L, where L is the total number of solutions to the first objective. The first optimal solution, The random step size generated by Levy's flight. The first random number between (0,1) The second random number between (0,1) The first objective solution is random, and it is consistent with... different; The first coefficient is randomly set to 0.01 or -0.01. For speed adjustment, It is the second coefficient, and it decreases linearly from 0.95 to 0.01 as training progresses; The third random number between (0,1) The fourth random number between (0,1) The preset maximum number of training iterations. for and The Euclidean distance between them.

[0011] Furthermore, by performing a random spiral information search within the solution space on the second objective solution, the third objective solution is obtained as follows: ; In the formula, For the first t The first training session m A second objective solution For the first m A third objective solution, The second objective solution is random, and is consistent with... different; is the helical direction coefficient, and is a random value between (-0.5, 0.5); It is a natural constant. (0,2) π Random angles between ) π Pi v This is the helical shape constant, and is set to 1.5 or 2.

[0012] Further, the search quality of the third objective solution is obtained, and a greedy mutation search is performed on the third objective solution based on the search quality to obtain the fourth objective solution, including: The search quality for obtaining the third objective solution is: ; ; In the formula, For the first t The first training session n A third objective solution, for Search performance for The historical best value, For the first t The first training session j The historical best value of the third objective solution For similarity, for and The Euclidean distance between them; Take out half of the third objective solutions with smaller search merit, and use the remaining third objective solutions directly as the fourth objective solutions; The extracted third target individual is mutated to obtain the mutated value: ; In the formula, For the first n One variant value, The fifth random number between (0,1) The sixth random number between (0,1) This is the worst solution; A greedy strategy is used to select the third target individual or its corresponding mutation value as the fourth target solution.

[0013] Furthermore, based on the predicted processing latency and server resource usage information, the objective function value corresponding to each backend server is obtained as follows: ; in, For the first k The objective function values for each backend server, k=1,2,…,K, where K is the total number of backend servers. As the first weighting coefficient, This is the second weighting coefficient. The third weighting coefficient, It is the fourth weighting coefficient, and + + + =1, The first in the server resource usage information k CPU utilization of each backend server The first in the server resource usage information k Memory utilization of each backend server The first in the server resource usage information k The number of active connections to each backend server. For the first k Predicted processing latency of each backend server The maximum threshold for CPU utilization. Memory utilization based on the number of active connections. The maximum threshold for the number of active connections. This represents the maximum allowable delay.

[0014] Furthermore, based on the objective function value, the task execution weights corresponding to each backend server are obtained, and load balancing is performed on the tasks to be executed according to the task execution weights, including: Based on the objective function value, the task execution weights corresponding to each backend server are obtained as follows: ; In the formula, For the first k The task execution weight corresponding to each backend server It is a natural constant. To adjust the parameters, These are the base values for the weights; When a task arrives, the M backend servers with the highest task execution weight are selected as candidate servers, and the server with the fewest active connections is selected from the candidate servers to execute the task, so as to achieve load balancing; M is a preset constant value and is less than K.

[0015] On the other hand, the present invention provides a load balancing system for a server cluster, comprising: The data acquisition module is used to acquire the running status data of each backend server in the server cluster, and to normalize the running status data to obtain normalized running status data. The latency prediction module is used to input the normalized running status data into the local pre-set time series prediction model through the backend server, and output the predicted processing latency of the backend server in the next scheduling cycle. The optimization metrics module is used to obtain the prediction processing latency and server resource usage information of each backend server through the relay server, and to obtain the objective function value of each backend server based on the prediction processing latency and server resource usage information. The load balancing module is used to obtain the task execution weight corresponding to each backend server based on the objective function value, and to perform load balancing on the tasks to be executed according to the task execution weight.

[0016] This invention provides a load balancing method and system for server clusters. It abandons the traditional passive mode that relies solely on the current instantaneous state for scheduling. By deploying a time-series prediction model locally on each backend server, it uses historical and current operating status data to predict the processing latency of the next scheduling cycle. This allows the load balancer to anticipate the upcoming load pressure on the servers, thereby avoiding servers that are about to be overloaded in advance. This significantly reduces the timeout rate and average response time of task requests. Furthermore, a new algorithm is introduced to deploy the time-series prediction model, improving prediction accuracy and ensuring the accuracy of decision-making. Attached Figure Description

[0017] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.

[0018] Figure 1 A flowchart illustrating a load balancing method for a server cluster provided in an embodiment of the present invention; Figure 2 This is a schematic diagram of the structure of a server cluster load balancing system provided in an embodiment of the present invention.

[0019] The accompanying drawings have illustrated specific embodiments of the invention, which will be described in more detail below. These drawings and descriptions are not intended to limit the scope of the invention in any way, but rather to illustrate the concept of the invention to those skilled in the art through reference to particular embodiments. Detailed Implementation

[0020] Exemplary embodiments will now be described in detail, examples of which are illustrated in the accompanying drawings. When the following description relates to the drawings, unless otherwise indicated, the same numerals in different drawings denote the same or similar elements. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatuses and methods consistent with some aspects of the invention as detailed in the appended claims.

[0021] The embodiments of the present invention will now be described in detail with reference to the accompanying drawings.

[0022] like Figure 1 As shown, this embodiment of the invention provides a load balancing method for a server cluster, including: S101. Obtain the running status data of each backend server in the server cluster, and normalize the running status data to obtain normalized running status data.

[0023] S102. The normalized running status data is input into the local pre-set time series prediction model through the backend server, and the prediction processing delay of the backend server in the next scheduling cycle is output.

[0024] This invention abandons the traditional passive scheduling model that relies solely on the current instantaneous state (such as the current number of connections). By deploying a Long Short-Term Memory (LSTM) network locally on each backend server as a time-series prediction model, it uses historical and current runtime state data (CPU, memory, number of connections, historical latency) to predict the processing latency of the next scheduling cycle. This allows the load balancer to anticipate the upcoming load pressure on the servers, thereby avoiding servers that are about to be overloaded and significantly reducing the timeout rate and average response time of task requests.

[0025] S103. Obtain the predicted processing latency and server resource usage information of each backend server through the relay server, and obtain the objective function value of each backend server based on the predicted processing latency and server resource usage information. S104. Based on the objective function value, obtain the task execution weight corresponding to each backend server, and perform load balancing on the tasks to be executed according to the task execution weight.

[0026] Optionally, a negative feedback and circuit breaker mechanism can be set. When the predicted latency of a server exceeds the preset circuit breaker threshold for G consecutive times (G is a preset constant), its weight is forcibly set to 0 and it is temporarily removed from the list of available servers until its status returns to normal.

[0027] In this embodiment of the invention, the running status data of each backend server in the server cluster is obtained, and the running status data is normalized to obtain normalized running status data, including: For any backend server in the server cluster, collect the CPU utilization, memory utilization, current number of connections, and average response latency over multiple past sampling time points (e.g., when the sampling frequency is 1 second, every 1 second is a sampling time point, and the average response latency of the previous 30 seconds can be collected) to obtain the backend server's running status data. The running status data is normalized to obtain normalized running status data.

[0028] For example, monitoring probes can be used to collect real-time operational status data from each backend server in the cluster. For any given backend server, the collected data includes: CPU utilization (%), memory utilization (%), current active connections, and the average response latency over the past 10 sampling points. To eliminate the influence of different units, the above data is normalized (e.g., Min-Max normalization), mapping the data to the [0,1] interval. In this embodiment of the invention, the backend server can collect and identify latency itself, and then report the data to the relay server, thus improving operating speed.

[0029] In this embodiment of the invention, the time-series prediction model pre-configured locally on each backend server is set as a Long Short-Term Memory (LSTM) network. Each backend server is pre-deployed with a trained LSTM network. The server constructs a vector from the normalized runtime state data and inputs it into the LSTM model. The model outputs the predicted processing latency for the server in the next scheduling cycle (e.g., the next 500ms). Because LSTM has excellent memory and capture capabilities for time-series data, it can accurately reflect the trend of load changes.

[0030] In this embodiment of the invention, before the first execution of load balancing, the following is also included: A time series prediction model is constructed using a long short-term memory network through a relay server, and a solution space is constructed based on the network parameters of the long short-term memory network. Multiple different candidate solutions are generated in the solution space to obtain the first target solution. For example, a Long Short-Term Memory (LSTM) network has multiple trainable parameters, each typically having a corresponding range of values. These ranges combine to form a high-dimensional solution space. Then, by randomly generating parameters within the range corresponding to each network parameter and combining these randomly generated parameters into a vector, a first target solution can be obtained. Repeating this process multiple times yields multiple first target solutions.

[0031] Obtain the fitness corresponding to each first objective solution, and determine the first optimal solution based on the fitness corresponding to the first objective solution; The normalized historical operating status data and the corresponding real latency of the future time window can be obtained in advance. The real latency of the future time window corresponding to the historical operating status data is the actual collected data. Therefore, the normalized historical operating status data can be used as input and the real latency can be used as the expected output to obtain the loss function value of the first objective solution. After adding the loss function value to a preset constant (such as 0.001) and taking the reciprocal, the fitness corresponding to the first objective solution can be obtained. The solution with the largest fitness is the first optimal solution.

[0032] Based on the first optimal solution, the first target solution is subjected to variable-speed optimization search in the solution space to obtain the second target solution; The second objective solution is subjected to a random spiral information search in the solution space to obtain the third objective solution; Obtain the search merit of the third objective solution, and perform a greedy mutation search on the third objective solution based on the search merit to obtain the fourth objective solution; Obtain the total number of training iterations and determine the training progress based on the total number of training iterations; the training progress includes whether training is completed or not. For example, training is considered complete when the total number of training iterations is greater than or equal to the preset maximum number of training iterations; otherwise, training is not considered complete.

[0033] If the training progress is incomplete, then the fourth objective solution will be used as the first objective solution in the next round of training. When the training progress indicates that training is complete, a second optimal solution is obtained based on the fourth objective solution and distributed to each backend server, allowing each backend server to pre-configure a time series prediction model based on the second optimal solution. For example, the network parameters in the second optimal solution are used as the final network parameters of the time series prediction model, and the time series prediction model is deployed.

[0034] Existing prediction-based load balancing methods often employ a single prediction model (such as a simple ARIMA or BP neural network). When dealing with complex, nonlinear server time-series data, these methods suffer from low prediction accuracy, slow convergence speed, and susceptibility to getting trapped in local optima, leading to inaccurate latency predictions and consequently affecting the effectiveness of load balancing decisions. Therefore, this application provides a novel algorithm for time-series prediction models, enabling them to more accurately predict latency and improve decision-making accuracy.

[0035] In this embodiment of the invention, based on the first optimal solution, the first target solution is subjected to variable-speed optimization search within the solution space to obtain the second target solution: ; ; In the formula, For the first t The first training session i The first objective solution For the first i A second objective solution i =1,2,…,L, where L is the total number of solutions to the first objective. The first optimal solution, The random step size generated by Levy's flight. The first random number between (0,1) The second random number between (0,1) The first objective solution is random, and it is consistent with... different; The first coefficient is randomly set to 0.01 or -0.01. For speed adjustment, It is the second coefficient, and it decreases linearly from 0.95 to 0.01 as training progresses; The third random number between (0,1) The fourth random number between (0,1) The preset maximum number of training iterations. for and The Euclidean distance between them.

[0036] By combining variable-speed optimization search with the long-step jump characteristics of Levy flight, the algorithm can effectively escape local optima and avoid the model parameters from getting trapped in local extrema. At the same time, the introduced variable-speed adjustment term gives the algorithm a strong global exploration capability in the early stage of training, while focusing on fine-grained local exploration in the later stage, thus accelerating the convergence speed.

[0037] In this embodiment of the invention, the second objective solution is subjected to a random spiral information search in the solution space to obtain the third objective solution: ; In the formula, For the first t The first training session m A second objective solution For the first m A third objective solution, The second objective solution is random, and is consistent with... different; is the helical direction coefficient, and is a random value between (-0.5, 0.5); It is a natural constant. (0,2) π Random angles between ) π Pi v This is the helical shape constant, and is set to 1.5 or 2.

[0038] The random spiral information search uses a random spiral information search formula to simulate the spiral trajectory of celestial motion, enabling candidate solutions to be searched in multiple dimensions around the current optimal solution, fully exploring the potential of the solution space and further improving the accuracy of parameter optimization.

[0039] In this embodiment of the invention, the search quality of the third objective solution is obtained, and a greedy mutation search is performed on the third objective solution based on the search quality to obtain the fourth objective solution, including: The search quality for obtaining the third objective solution is: ; ; In the formula, For the first t The first training session n A third objective solution, for Search performance for The historical best value, For the first t The first training session j The historical best value of the third objective solution For similarity, for and The Euclidean distance between them; Take out half of the third objective solutions with smaller search merit, and use the remaining third objective solutions directly as the fourth objective solutions; The extracted third target individual is mutated to obtain the mutated value: ; In the formula, For the first n One variant value, The fifth random number between (0,1) The sixth random number between (0,1) This is the worst solution; A greedy strategy is used to select the third target individual or its corresponding mutation value as the fourth target solution.

[0040] Greedy mutation search introduces a search excellence evaluation, performing targeted mutation operations on individuals with poor fitness. This operation uses information from the worst and best solutions to guide the mutation direction, preserving population diversity while quickly eliminating inferior solutions and ensuring high-quality model parameters.

[0041] In this embodiment of the invention, the objective function value corresponding to each backend server is obtained based on the predicted processing latency and server resource usage information: ; in, For the first k The objective function values for each backend server, k=1,2,…,K, where K is the total number of backend servers. As the first weighting coefficient, This is the second weighting coefficient. The third weighting coefficient, It is the fourth weighting coefficient, and + + + =1, The first in the server resource usage information k CPU utilization of each backend server The first in the server resource usage information k Memory utilization of each backend server The first in the server resource usage information k The number of active connections to each backend server. For the first k Predicted processing latency of each backend server The maximum threshold for CPU utilization. Memory utilization based on the number of active connections. The maximum threshold for the number of active connections. This represents the maximum allowable delay.

[0042] This invention, when calculating the objective function value, not only considers the predicted processing latency but also comprehensively considers CPU utilization, memory utilization, and the number of active connections. Through a weighted summation method, it comprehensively evaluates the server's health status and load capacity. Compared to scheduling algorithms based on a single metric, this multi-dimensional evaluation avoids service unavailability caused by a single resource bottleneck (such as insufficient memory but idle CPU), achieving balanced and rational utilization of server cluster resources.

[0043] In this embodiment of the invention, based on the objective function value, the task execution weights corresponding to each backend server are obtained, and load balancing of the tasks to be executed is performed according to the task execution weights, including: Based on the objective function value, the task execution weights corresponding to each backend server are obtained as follows: ; In the formula, For the first k The task execution weight corresponding to each backend server It is a natural constant. To adjust the parameters, These are the base values for the weights; When a task arrives, the M backend servers with the highest task execution weight are selected as candidate servers, and the server with the fewest active connections is selected from the candidate servers to execute the task, so as to achieve load balancing; M is a preset constant value and is less than K.

[0044] This invention, when calculating the objective function value, considers not only the predicted processing latency but also CPU utilization, memory utilization, and the number of active connections. Through a weighted summation method, it comprehensively evaluates the server's health and load capacity. Compared to scheduling algorithms based on a single metric, this multi-dimensional evaluation avoids service unavailability caused by a single resource bottleneck (such as insufficient memory but idle CPU), achieving balanced and rational utilization of server cluster resources. In the task allocation phase, this invention first calculates the task execution weight based on the objective function value, giving servers with better overall performance a higher probability of being selected. Subsequently, from the M candidate servers with the highest weights, the one with the fewest active connections is selected to execute the task.

[0045] The first layer of weighted filtering ensures that tasks always tend to flow to the server group with the best overall performance; the second layer of connection number fine-tuning prioritizes the node with the fewest connections in the high-performance server group. This is similar to paying attention to the balance between work and rest when assigning tasks to "high-achieving students", which effectively prevents queuing and backlog caused by a sudden increase in the number of connections even if the weight is high, and further smooths out load spikes.

[0046] like Figure 2 As shown, this embodiment of the invention provides a load balancing system for a server cluster, comprising: The data acquisition module 201 is used to acquire the running status data of each backend server in the server cluster, and to normalize the running status data to obtain normalized running status data. The latency prediction module 202 is used to input the normalized running status data into the local pre-set time series prediction model through the back-end server, and output the predicted processing latency of the back-end server in the next scheduling cycle. The optimization index module 203 is used to obtain the prediction processing latency and server resource usage information of each backend server through the relay server, and obtain the objective function value of each backend server based on the prediction processing latency and server resource usage information. The load balancing module 204 is used to obtain the task execution weight corresponding to each backend server based on the objective function value, and to perform load balancing on the tasks to be executed according to the task execution weight.

[0047] The load balancing system for a server cluster provided by this invention can execute the above method, and its principle and beneficial effects are similar, so they will not be described in detail here.

[0048] Other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common knowledge or customary techniques in the art not disclosed herein. It should be understood that the invention is not limited to the precise structures described above and shown in the accompanying drawings, and various modifications and changes can be made without departing from its scope. The scope of the invention is limited only by the appended claims.

Claims

1. A load balancing method for a server cluster, characterized in that, include: Obtain the running status data of each backend server in the server cluster, and normalize the running status data to obtain normalized running status data. The normalized running status data is input into the local pre-set time series prediction model through the backend server, and the prediction processing delay of the backend server in the next scheduling cycle is output. The predicted processing latency and server resource usage information of each backend server are obtained through the relay server, and the objective function value of each backend server is obtained based on the predicted processing latency and server resource usage information. Based on the objective function value, the task execution weights corresponding to each backend server are obtained, and the load balancing of the tasks to be executed is performed according to the task execution weights.

2. The load balancing method for a server cluster according to claim 1, characterized in that, Obtain the running status data of each backend server in the server cluster, and normalize the running status data to obtain normalized running status data, including: For any backend server in the server cluster, collect the CPU utilization, memory utilization, current number of connections, and average response latency at multiple past sampling time points to obtain the backend server's operating status data. The running status data is normalized to obtain normalized running status data.

3. The load balancing method for a server cluster according to claim 1, characterized in that, Each backend server has a pre-configured time-series prediction model set to a long short-term memory network.

4. The load balancing method for a server cluster according to claim 1, characterized in that, Before the initial load balancing execution, the following is also included: A time series prediction model is constructed using a long short-term memory network through a relay server, and a solution space is constructed based on the network parameters of the long short-term memory network. Multiple different candidate solutions are generated in the solution space to obtain the first target solution. Obtain the fitness corresponding to each first objective solution, and determine the first optimal solution based on the fitness corresponding to the first objective solution; Based on the first optimal solution, the first target solution is subjected to variable-speed optimization search in the solution space to obtain the second target solution; The second objective solution is subjected to a random spiral information search in the solution space to obtain the third objective solution; Obtain the search merit of the third objective solution, and perform a greedy mutation search on the third objective solution based on the search merit to obtain the fourth objective solution; Obtain the total number of training iterations and determine the training progress based on the total number of training iterations; the training progress includes whether training is completed or not. If the training progress is incomplete, then the fourth objective solution will be used as the first objective solution in the next round of training. When the training progress indicates that training is complete, a second optimal solution is obtained based on the fourth objective solution and distributed to each backend server so that each backend server can pre-set a time series prediction model based on the second optimal solution.

5. The load balancing method for a server cluster according to claim 4, characterized in that, Based on the first optimal solution, the first objective solution is subjected to variable-speed optimization search within the solution space to obtain the second objective solution: ；； In the formula, For the first t The first training session i The first objective solution For the first i A second objective solution i =1,2,…,L, where L is the total number of solutions to the first objective. The first optimal solution, The random step size generated by Levy's flight. The first random number between (0,1) The second random number between (0,1) The first objective solution is random, and it is consistent with... different; The first coefficient is randomly set to 0.01 or -0.

01. For speed adjustment, It is the second coefficient, and it decreases linearly from 0.95 to 0.01 as training progresses; The third random number between (0,1) The fourth random number between (0,1) The preset maximum number of training iterations. for and The Euclidean distance between them.

6. The load balancing method for a server cluster according to claim 5, characterized in that, By performing a random spiral information search within the solution space on the second objective solution, the third objective solution is obtained as follows: ； In the formula, For the first t The first training session m A second objective solution For the first m A third objective solution, The second objective solution is random, and is consistent with... different; is the helical direction coefficient, and is a random value between (-0.5, 0.5); It is a natural constant. (0,2) π Random angles between ) π Pi v This is the helical shape constant, and is set to 1.5 or 2.

7. The load balancing method for a server cluster according to claim 6, characterized in that, Obtain the search quality of the third objective solution, and perform a greedy mutation search on the third objective solution based on the search quality to obtain the fourth objective solution, including: The search quality for obtaining the third objective solution is: ；； In the formula, For the first t The first training session n A third objective solution, for Search performance for The historical best value, For the first t The first training session j The historical best value of the third objective solution For similarity, for and The Euclidean distance between them; Take out half of the third objective solutions with smaller search merit, and use the remaining third objective solutions directly as the fourth objective solutions; The extracted third target individual is mutated to obtain the mutated value: ； In the formula, For the first n One variant value, The fifth random number between (0,1) The sixth random number between (0,1) This is the worst solution; A greedy strategy is used to select the third target individual or its corresponding mutation value as the fourth target solution.

8. The load balancing method for a server cluster according to claim 1, characterized in that, Based on the predicted processing latency and server resource usage information, the objective function value corresponding to each backend server is obtained as follows: ； in, For the first k The objective function values for each backend server, k=1,2,…,K, where K is the total number of backend servers. As the first weighting coefficient, This is the second weighting coefficient. The third weighting coefficient, It is the fourth weighting coefficient, and + + + =1, The first in the server resource usage information k CPU utilization of each backend server The first in the server resource usage information k Memory utilization of each backend server The first in the server resource usage information k The number of active connections to each backend server. For the first k Predicted processing latency of each backend server The maximum threshold for CPU utilization. Memory utilization based on the number of active connections. The maximum threshold for the number of active connections. This represents the maximum allowable delay.

9. The load balancing method for a server cluster according to claim 8, characterized in that, Based on the objective function value, the task execution weights corresponding to each backend server are obtained, and load balancing of the tasks to be executed is performed according to the task execution weights, including: Based on the objective function value, the task execution weights corresponding to each backend server are obtained as follows: ； In the formula, For the first k The task execution weight corresponding to each backend server It is a natural constant. To adjust the parameters, These are the base values for the weights; When a task arrives, the M backend servers with the highest task execution weight are selected as candidate servers, and the server with the fewest active connections is selected from the candidate servers to execute the task, so as to achieve load balancing; M is a preset constant value and is less than K.

10. A load balancing system for a server cluster, characterized in that, include: The data acquisition module is used to acquire the running status data of each backend server in the server cluster, and to normalize the running status data to obtain normalized running status data. The latency prediction module is used to input the normalized running status data into the local pre-set time series prediction model through the backend server, and output the predicted processing latency of the backend server in the next scheduling cycle. The optimization metrics module is used to obtain the prediction processing latency and server resource usage information of each backend server through the relay server, and to obtain the objective function value of each backend server based on the prediction processing latency and server resource usage information. The load balancing module is used to obtain the task execution weight corresponding to each backend server based on the objective function value, and to perform load balancing on the tasks to be executed according to the task execution weight.