A server cluster elastic resource scheduling method based on reinforcement learning

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By employing a reinforcement learning-based elastic resource scheduling method for server clusters, and utilizing graph attention networks and policy gradient modules to optimize resource allocation, this approach addresses the issues of resource waste and insufficiency in traditional scheduling methods, achieving efficient resource management and improved system stability.

CN122220109APending Publication Date: 2026-06-16WUHAN ZHONGAN ZHITONG TECH CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: WUHAN ZHONGAN ZHITONG TECH CO LTD
Filing Date: 2026-04-14
Publication Date: 2026-06-16

Application Information

Patent Timeline

14 Apr 2026

Application

16 Jun 2026

Publication

CN122220109A

IPC: G06F9/50; G06F18/241; G06F18/25; G06N3/092; G06N3/0442

AI Tagging

Application Domain

Resource allocation Biological models

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122220109A_ABST

Patent Text Reader

Abstract

The application relates to the technical field of resource scheduling, in particular to a server cluster elastic resource scheduling method based on reinforcement learning, which comprises the following steps: preset execution constraint conditions of cluster elastic resource scheduling are constructed; an elastic scheduling action of a current scheduling period is determined according to a historical scheduling experience value table and the preset execution constraint conditions; cluster monitoring data of a target resource cluster, a target stretch group and a target physical server are acquired; static features, dynamic features, task features and historical features of a target resource unit are extracted from the cluster monitoring data to obtain a feature set. The algorithm based on reinforcement learning continuously updates the historical scheduling experience table, so that the resource configuration can be automatically adjusted according to the actual resource use and service load, and the accumulation of historical experience can improve the precision and adaptability of the scheduling strategy.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of resource scheduling technology, specifically to a server cluster elastic resource scheduling method based on reinforcement learning. Background Technology

[0002] Currently, traditional resource scheduling methods are usually based on fixed rules and preset resource configurations. This makes it difficult for the system to make flexible adjustments when faced with uncertain and changing load demands. Even if some methods can make some predictions based on historical data, they cannot adapt to load fluctuations in real time, which can easily lead to waste or shortage of resources. Moreover, traditional scheduling strategies usually require manual adjustment or rely on long-term experience accumulation and lack a real-time feedback mechanism. Therefore, when the system load suddenly increases or resources become scarce, traditional methods often cannot react in time, which may lead to service delays or system instability.

[0003] Furthermore, traditional methods cannot optimize resource allocation in real time, which may lead to insufficient or excessive resource utilization. For example, server resources may be over-allocated under low load, while resources may be insufficient under high load, resulting in performance degradation and prolonged response time. Moreover, traditional scheduling methods usually rely on hard-coded rules or simple prediction algorithms, lacking in-depth analysis of multiple factors such as system load, task type, and hardware configuration. Unlike reinforcement learning-based methods, they cannot intelligently predict and dynamically adjust resource allocation to maximize the effective utilization of resources. Summary of the Invention

[0004] To achieve the above objectives, the present invention provides the following technical solution: a server cluster elastic resource scheduling method based on reinforcement learning, comprising: Construct preset execution constraints for cluster elastic resource scheduling; determine the elastic scheduling actions for the current scheduling cycle based on the historical scheduling experience value table and the preset execution constraints; Acquire cluster monitoring data of the target resource cluster, target scaling group, and target physical server; extract static features, dynamic features, task features, and historical features of the target resource unit from the cluster monitoring data to obtain a feature set; The feature set is input into the resource scheduling reinforcement learning model to obtain the resource adjustment prediction result of the target physical server; based on the resource adjustment prediction result and the historical scheduling experience value table, the final elastic scaling instruction for the target physical server is generated. The system obtains real-time feedback information after the server cluster management system executes the final elastic scaling command by using a preset reward function, and updates the current cluster topology information and current service load information of the server cluster management system. Update the historical scheduling experience value table based on the current cluster topology information, the current service load information, and real-time feedback information. Determine whether there is an optimal experience value in the updated historical scheduling experience value table that meets the preset execution constraints of the next scheduling cycle; if there is an optimal experience value that meets the conditions, then execute the elastic resource scheduling of the next scheduling cycle based on the optimal experience value and the real-time cluster monitoring data of the next scheduling cycle.

[0005] Preferably, the cluster monitoring data includes resource configuration data and real-time performance data obtained by probes deployed in the server cluster to collect data from the target resource unit in real time. The static features, dynamic features, task features, and historical features of the target resource unit are extracted from the cluster monitoring data to obtain a feature set, including: The hardware list in the resource configuration data is input into the trained hardware recognition model to obtain the output hardware parameter recognition results, and the CPU architecture, memory capacity and disk type are determined as static features based on the hardware parameters. Based on the time-series stream of the real-time performance data, the current load, I / O rate, and network throughput are determined as dynamic features. Input the task description information in the queue of tasks to be processed into the trained task classification model, and obtain the task type and task priority as the task features. Extract the task dependencies from the task scheduling logs in the real-time performance data as task features; Historical load curves and resource adjustment strategies for different historical periods are extracted from the historical scaling records as historical features.

[0006] Preferably, the resource scheduling reinforcement learning model includes a graph attention network module, a long short-term memory network module, and a policy gradient module; the policy gradient module includes an actor network sub-model and a critic network sub-model; the resource adjustment prediction results include predicted values to characterize resource utilization, recommended scaling range, and expected response time changes; The feature set is input into the resource scheduling reinforcement learning model to obtain the resource adjustment prediction result of the target physical server, including: The feature set is respectively input into the graph attention network module, the long short-term memory network module, and the policy gradient module; The graph attention network module is used to extract resource topology association results based on the topology of the server cluster; The long short-term memory network module is used to extract load trend prediction results based on historical features; The strategy gradient module uses dynamic features and task features to output a value assessment result for scaling actions; The resource adjustment prediction result is generated by integrating the resource topology association result, the load trend prediction result, and the scaling action value assessment result.

[0007] Preferably, the preset reward function includes cluster selection reward, grouping reward, and deployment reward; The real-time feedback information obtained by the server cluster management system after executing the final elastic scaling command is obtained through a preset reward function, including: When performing a capacity expansion operation, obtain the cluster resource redundancy and network latency data after selecting the target resource cluster; if the cluster resource redundancy is minimized while meeting the peak service demand and the network latency is minimized, a positive reward is generated. When performing grouping actions, obtain the status of associated processes and the data synchronization requirements of service instances within the target scaling group; if the associated processes of the same type of service and the service instances with data synchronization requirements are located in the same scaling group, and the resource contention instances are located in different scaling groups, then a positive reward is generated. When performing a deployment action, obtain the resource capacity limit of the target physical server and the number of physical servers launched; if it does not exceed the physical server resource capacity limit and the number of physical servers launched in the scaling group is the minimum, a positive reward is generated.

[0008] Preferably, based on the resource adjustment prediction results and the historical scheduling experience value table, a final elastic scaling instruction for the target physical server is generated, including: If the resource adjustment prediction result indicates that the resource utilization rate is lower than the threshold in the historical scheduling experience value table and the load continues to decrease, then the current resource configuration is maintained. If the resource adjustment prediction result indicates that the resource utilization rate is higher than the threshold in the historical scheduling experience value table and the load continues to increase, then an expansion instruction containing task migration paths is generated by combining the optimal scaling range in the historical scheduling experience value table.

[0009] Preferably, before updating the experience values in the historical scheduling experience value table based on the current cluster topology information, the current service load information, and real-time feedback information, the method further includes: Obtain the current hardware resource configuration information and current service deployment information of the server cluster to obtain the current cluster topology information; the current cluster topology information includes the hierarchical structure of resource clusters, resource scaling groups, physical servers, and the connection information between each level and shared storage; Obtain real-time resource usage data and current service request queue information of the server cluster to obtain current service load information.

[0010] Preferably, the current hardware resource configuration information and current service deployment information of the server cluster are obtained, including: The physical servers in the server cluster are divided according to whether they are connected to the same shared storage to obtain resource scaling groups; The resource scaling groups are divided according to whether they belong to the same geographical region and are interconnected with the network to obtain the corresponding resource clusters; The hardware structure of the server cluster is determined based on the hierarchical relationship between the resource cluster, the resource scaling group, and the physical server.

[0011] Preferably, the preset execution constraints include a first adaptation rule constructed based on the matching relationship between the current server cluster's resource expansion demand and the cluster resource threshold dynamically generated based on preset elastic targets; a second adaptation rule based on preset elastic target settings to schedule related processes of the same type of service and service instances with data synchronization needs to the same resource scaling group; and a third adaptation rule based on preset elastic target settings to schedule service instances to the physical server with the highest resource idle rate in the resource scaling group. Based on historical scheduling experience values and preset execution constraints, determine the flexible scheduling actions for the current scheduling period, including: Determine the cluster selection experience value corresponding to the current scheduling period in the historical scheduling experience value table, so as to select the target resource cluster that satisfies the first adaptation rule from the candidate resource clusters based on the cluster selection experience value and is used to perform the expansion action; Determine the scaling group selection experience value corresponding to the current scheduling period in the historical scheduling experience value table, so as to select a target scaling group that satisfies the second adaptation rule from each of the resource scaling groups in the target resource cluster based on the scaling group selection experience value; Determine the server selection experience value corresponding to the current scheduling period in the historical scheduling experience value table, and select a physical server that meets the third adaptation rule from each physical server in the target scaling group based on the server selection experience value.

[0012] Preferably, determining the cluster selection experience value corresponding to the current scheduling period in the historical scheduling experience value table, and selecting a target resource cluster for performing expansion actions from candidate resource clusters based on the cluster selection experience value, which satisfies the first adaptation rule, includes: Obtain the resource expansion demand matching value of each candidate resource cluster recorded in the current service operation load information; Select candidate resource clusters whose resource expansion demand matching values meet the cluster resource threshold range as initial candidate clusters; Determine whether there are any resource clusters with existing service deployments among the initial candidate clusters; If there is only one resource cluster with existing service deployment in the initial candidate clusters, then the resource cluster with existing service deployment is determined as the target resource cluster for performing the expansion action. If there are multiple resource clusters with existing service deployments in the initial candidate clusters, the resource cluster with the lowest latency in the service request source area network will be determined as the target resource cluster for performing the expansion action.

[0013] Preferably, determining whether there exists an optimal experience value in the updated historical scheduling experience value table that satisfies the preset execution constraints for the next scheduling cycle includes: Based on the actual values of each state parameter within the fluctuation range of each elastic scheduling action, the elastic scheduling actions that can be executed in the current scheduling cycle are determined to construct a candidate action set; the fluctuation range of the state parameters is the allowable fluctuation range of parameters corresponding to the current cluster topology information and the current service running load information in the current scheduling cycle. Obtain the experience value corresponding to each action in the candidate action set from the historical scheduling experience value table to obtain the experience value set corresponding to the current scheduling cycle; The optimal empirical value corresponding to the current scheduling cycle is determined from the set of empirical values according to the maximum value filtering rule; If the reward function value corresponding to the optimal experience value is greater than the preset threshold, then it is determined that there exists an optimal experience value that satisfies the preset execution constraints of the next scheduling cycle.

[0014] Compared with the prior art, the beneficial effects of the present invention are: (1) This invention continuously updates the historical scheduling experience table through a reinforcement learning-based algorithm, which enables the automatic adjustment of resource configuration based on actual resource usage and service load. The accumulation of historical experience can improve the accuracy and adaptability of the scheduling strategy, ensuring that the cluster can flexibly respond to different time periods and different load conditions. Moreover, by comprehensively analyzing various characteristics such as hardware configuration, task type and system load, the method can predict the reasonable configuration of resources and dynamically adjust resource allocation, which helps to avoid resource waste and ensure that the server cluster can be appropriately expanded when resources are sufficient and timely contracted when resources are scarce, thereby improving resource utilization. (2) This invention obtains feedback information after system execution through a real-time feedback mechanism, thereby making rapid adjustments. This dynamic adjustment strategy based on the reward function can ensure that the system continuously optimizes the resource allocation strategy when facing different environments and needs, improves the system response speed and stability, and by optimizing resource scaling decisions, the method can reduce service latency, reduce network latency and improve load balancing, thereby significantly improving service quality and meeting the needs of different types of tasks and services. Especially during high load periods, it can efficiently ensure the stability and response speed of the system. Attached Figure Description

[0015] Figure 1 This is a schematic flowchart of the overall method in one embodiment of the present invention. Detailed Implementation

[0016] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0017] Example 1, please refer to Figure 1 This invention provides a technical solution: a server cluster elastic resource scheduling method based on reinforcement learning, comprising: S1. Construct preset execution constraints for cluster elastic resource scheduling; determine the elastic scheduling action for the current scheduling cycle based on the historical scheduling experience value table and preset execution constraints; S2. Obtain cluster monitoring data of the target resource cluster, target scaling group, and target physical server; extract static features, dynamic features, task features, and historical features of the target resource unit from the cluster monitoring data to obtain a feature set; S3. Input the feature set into the resource scheduling reinforcement learning model to obtain the resource adjustment prediction result of the target physical server; based on the resource adjustment prediction result and the historical scheduling experience value table, generate the final elastic scaling instruction for the target physical server. S4. Obtain real-time feedback information after the server cluster management executes the final elastic scaling command through a preset reward function, and update the current cluster topology information and current service load information of the server cluster management. S5. Update the historical scheduling experience value table based on the current cluster topology information, current service load information, and real-time feedback information. S6. Determine whether there is an optimal experience value in the updated historical scheduling experience value table that meets the preset execution constraints of the next scheduling cycle; if there is an optimal experience value that meets the conditions, then execute the elastic resource scheduling of the next scheduling cycle based on the optimal experience value and the real-time cluster monitoring data of the next scheduling cycle.

[0018] In an optional embodiment, the cluster monitoring data includes resource configuration data and real-time performance data obtained by probes deployed in the server cluster to collect data on the target resource unit in real time. The static, dynamic, task, and historical features of the target resource unit are extracted from the cluster monitoring data to obtain a feature set, including: Input the hardware list in the resource configuration data into the trained hardware recognition model to obtain the output hardware parameter recognition results, and determine the CPU architecture, memory capacity and disk type as static features based on the hardware parameters. Based on the time-series stream of real-time performance data, current load, I / O rate, and network throughput are determined as dynamic features. Input the task description information in the queue of tasks to be processed into the trained task classification model, and obtain the task type and task priority as task features. Extract task dependencies as task features from task scheduling logs in real-time performance data; Historical load curves and resource adjustment strategies for different historical periods are extracted from historical scaling records as historical features.

[0019] It's important to note that static features primarily originate from server hardware configuration data. These features are typically relatively fixed and do not change over time. For example, suppose you have a database service running on a physical server. You would collect hardware configuration data for this server (such as CPU type, memory size, disk type, etc.). This data would be identified and parsed using a hardware identification model to ultimately determine the server's hardware characteristics. Examples include: CPU architecture (e.g., x86 or ARM); memory capacity (e.g., 16GB); and disk type (e.g., SSD or HDD). These static features help understand the maximum load and resource processing capabilities that the server can support. Dynamic features are extracted from real-time performance data and typically change over time. These features reflect the current operational status, including load, I / O operations, and network usage. For example, suppose this database server is handling multiple concurrent requests; the following dynamic features would be monitored in real time: Current load: For example, the server's CPU utilization might be 80%, indicating a heavy load; I / O rate: If the server is performing a large number of disk read / write operations, the I / O rate might reach 100MB per second, indicating frequent disk operations; Network throughput: For example, the server's network bandwidth usage might be 500Mbps, indicating it is handling a large amount of external data exchange. These dynamic features help understand the server's current performance bottlenecks and resource consumption. Task features are derived from the descriptions of tasks to be processed and task scheduling logs. These features help understand the type, priority, and dependencies between tasks. For example, suppose a web service cluster has multiple different types of requests, such as user registration requests, file upload requests, and data query requests. The system will: take task descriptions from the task queue, process them through a trained task classification model, and output the task type (e.g., query task or upload task) and task priority (e.g., query task has high priority, upload task has low priority); extract dependencies between tasks from the task scheduling logs; for example, a data update task can only be executed after an upload task is completed; identify this dependency and optimize task scheduling accordingly to ensure the correct order of execution. Historical features, derived from historical scaling records, reflect past load and resource adjustment strategies. These features help predict and make decisions about future resource scheduling using historical data. For example, suppose there were multiple high load peaks during a certain period in the past. Historical scaling records would document the resource adjustment strategies adopted during these high load periods. For instance, historically, when CPU load exceeded 70%, two more servers were added to handle the increased load; when the load suddenly decreased during a certain period, the redundant servers were shut down to save resources.

[0020] In an optional embodiment, the resource scheduling reinforcement learning model includes a graph attention network module, a long short-term memory network module, and a policy gradient module; the policy gradient module includes an actor network sub-model and a critic network sub-model; the resource adjustment prediction results include predicted values to characterize resource utilization, recommended scaling range, and expected response time changes; The feature set is input into the resource scheduling reinforcement learning model to obtain the resource adjustment prediction results for the target physical server, including: The feature set is input into the graph attention network module, the long short-term memory network module, and the policy gradient module, respectively. The graph attention network module is used to extract resource topology association results based on the topology of the server cluster; The load trend prediction results are extracted based on historical features using a long short-term memory network module. The strategy gradient module is used to output the value evaluation results of scaling actions based on dynamic features and task features; By integrating resource topology correlation results, load trend prediction results, and scaling action value assessment results, resource adjustment prediction results are generated.

[0021] It should be noted that, based on the server cluster's status information (including real-time performance data, resource configuration data, task characteristics, etc.), reinforcement learning algorithms are used to automatically adjust resources, thereby improving cluster performance and response time. Graph Attention Networks (GATs) are primarily used to process the topology data of server clusters. Servers in a cluster are typically interconnected (e.g., via network switches, routers, etc.), making the relationships and communication topology between servers crucial factors. GATs capture this complex topology through attention mechanisms, enabling each server to consider its position within the topology and its relationships with other servers when making decisions. For example, suppose a server cluster consists of multiple nodes, each representing a physical server, connected via a network. Some servers within the cluster have strong communication needs, while others communicate less. Graph Attention Networks assign different attention weights (attention values) to each pair of servers based on these connections, accurately identifying which servers have more intensive resource needs and which resource adjustments might significantly impact the overall cluster performance. LSTM is used to process and predict load trends in historical features. LSTM is a commonly used model for processing time series data, particularly suitable for capturing dependencies and trends over long periods. In resource scheduling, LSTM can predict future load changes based on historical load data, providing a basis for resource expansion or contraction. For example, suppose the load of a server has shown regular fluctuations during specific periods over the past few days (e.g., high load during the day and low load at night). By learning from this historical load data, LSTM can predict future load trends, helping schedulers predict whether the server may face high or low load in the next few hours. Based on this prediction, resource adjustment decisions can be made in advance, such as increasing processing capacity during the day and reducing resources at night. The policy gradient module, based on reinforcement learning principles, trains a model to determine how to adjust resources (e.g., increase or decrease the number of servers, adjust load balancing strategies, etc.). It comprises two sub-models: an Actor network and a Critic network. The Actor network is responsible for selecting specific resource adjustment actions (e.g., adding servers, removing servers, adjusting load, etc.). The Critic network is responsible for evaluating the effectiveness of these actions, assessing the merits of resource adjustment decisions by calculating a value function. For example, during task scheduling, the Actor network might choose to add two servers to handle an upcoming load spike, while the Critic network evaluates the effectiveness of this decision. For instance, if adding two servers effectively distributes the load and reduces response time, the Critic network will give a high evaluation score; if the effect is poor, the Critic network will provide a lower evaluation score, thus guiding the Actor network to adjust its strategy. The outputs of all modules are merged to form the final resource adjustment prediction result. Each module's output provides information from different dimensions to help comprehensively understand current resource needs and adjustment strategies. Resource topology correlation results: The GAT module identifies which server relationships require special attention. Load trend prediction results: The LSTM module predicts future load changes. Scaling action value assessment results: The policy gradient module evaluates the value of different resource adjustment decisions. This information is combined to generate specific resource adjustment prediction results, determining how to adjust cluster resources to optimize performance. This includes the following aspects: Predicted resource utilization: Predicting the current cluster resource utilization (e.g., CPU, memory); Scaling magnitude: Recommending the specific amount of resources to add or remove; Expected response time changes: How the expected response time will change after these adjustments.

[0022] In one alternative embodiment, the preset reward function includes cluster selection reward, grouping reward, and deployment reward; Real-time feedback information after the server cluster management executes the final elastic scaling command is obtained through a preset reward function, including: When performing a capacity expansion operation, obtain the cluster resource redundancy and network latency data after selecting the target resource cluster; if the cluster resource redundancy is minimized while meeting the peak service demand and the network latency is minimized, a positive reward is generated. When performing grouping actions, obtain the status of associated processes and the data synchronization requirements of service instances within the target scaling group; if the associated processes of the same type of service and the service instances with data synchronization requirements are located in the same scaling group, and the resource contention instances are located in different scaling groups, then a positive reward is generated. When performing a deployment action, obtain the resource capacity limit of the target physical server and the number of physical servers launched; if it does not exceed the physical server resource capacity limit and the number of physical servers launched in the scaling group is the minimum, a positive reward is generated.

[0023] It's important to note that the cluster selection reward is an evaluation of the chosen target resource cluster. The purpose of this reward function is to determine the rationality of the selected cluster based on its performance. Factors typically considered include cluster resource redundancy and network latency. Resource redundancy refers to the sufficiency of resources (such as CPU and memory) within the cluster. Excessive redundancy wastes resources, while insufficient redundancy can lead to performance bottlenecks. Network latency refers to the communication delay between nodes within the cluster. Lower latency means that resources within the cluster can collaborate more efficiently, improving overall performance. For example, suppose a web application's server cluster is scaling up and selects a new cluster to add resources. This cluster has a moderate amount of resource redundancy (e.g., remaining CPU and memory), just enough to meet future peak service demands, and low network latency. A positive reward is generated based on this information, indicating that the cluster selection is reasonable and helps improve performance. Conversely, if the cluster has excessive resource redundancy, wasting unnecessary resources, or if the cluster's network latency is too high, slowing down data transmission, a negative reward is generated, indicating that the selection is not ideal. Grouping rewards focus on how to rationally allocate service instances to scaling groups, especially considering the associated process states and data synchronization needs of service instances. The goal of this reward function is to place service instances and associated processes with data synchronization needs in the same scaling group, avoiding resource contention between service instances in different scaling groups. Associated process states refer to whether certain service instances need to frequently exchange or synchronize data; placing these processes in the same scaling group helps reduce data synchronization latency. Data synchronization needs refer to whether certain service instances have strong dependencies that require data synchronization; if they are assigned to different scaling groups, it may lead to resource contention and performance degradation. Example: Suppose an application's load needs to be provided by two servers, one handling database requests and the other handling computation. Because there is a strong data synchronization need between the service instances on these two servers (e.g., frequent data exchange between the database and computation), placing them in the same scaling group can effectively reduce data synchronization latency. If these service instances are placed in different scaling groups, it may lead to resource contention between them, thus affecting performance. In this case, a positive reward will be generated based on the optimization goal, indicating that such grouping configuration is effective. Deployment rewards focus on the resource rationality of launching service instances on physical servers, especially considering resource capacity limits and the number of physical servers launched. The goal of this reward function is to avoid exceeding the resource capacity limits of physical servers as much as possible, while also launching as few new servers as possible to save costs. Resource capacity limits refer to the maximum amount of resources that each physical server can handle; for example, a physical server has limits on its CPU, memory, and storage resources. Number of physical servers launched refers to the actual number of servers launched; fewer servers mean higher resource utilization and lower costs. Example: Suppose we decide to deploy a service instance on a physical server with a fixed resource capacity limit, and some resources are already being used. Resource scheduling decides to launch more physical servers to expand resources, but its goal is to avoid launching too many servers. If the service demand can be met without exceeding the resource capacity limits through reasonable resource allocation and the minimum number of servers, a positive reward will be generated, indicating that the deployment operation is reasonable. Conversely, if more physical servers are launched, resulting in exceeding the resource capacity limits or launching too many servers, a negative reward will be generated, indicating that the operation is not optimal.

[0024] In an optional embodiment, based on resource adjustment prediction results and a historical scheduling experience value table, a final elastic scaling instruction for the target physical server is generated, including: If the resource adjustment forecast indicates that the resource utilization rate is lower than the threshold in the historical scheduling experience value table and the load continues to decrease, then maintain the current resource configuration; If the resource adjustment prediction results indicate that the resource utilization rate is higher than the threshold in the historical scheduling experience value table and the load continues to rise, then an expansion instruction containing task migration paths is generated by combining the optimal scaling range in the historical scheduling experience value table.

[0025] It's important to note that the analysis combines resource adjustment predictions (such as the utilization rates of CPU, memory, and storage) with historical scheduling experience tables (which record the scheduling strategies and thresholds for each server under different loads in the past). If the prediction results show that a server's resource utilization is lower than the threshold in historical experience, and the load trend continues to decline, then the current resources are sufficient, and increasing or decreasing resources is unnecessary. The existing resource configuration will be maintained to avoid waste or unnecessary operations. For example, suppose there is a server processing online orders. Historical scheduling experience tables show that resources are sufficient when CPU utilization is below 40%. Now, the predicted CPU utilization of this server is 35%, and the load has been declining over the past hour. Combining this information, it is determined that no expansion or task migration is needed; maintaining the existing configuration is sufficient. This saves operational costs and avoids unnecessary resource adjustments. When resource prediction shows that the server utilization is higher than the historical threshold and the load continues to rise, it indicates that the server resources may be insufficient and expansion is needed. At this time, the optimal scaling range in the historical scheduling experience table will be combined to determine the scope and strategy of expansion and generate instructions containing task migration paths. The forecast results indicate high load and a need for more resources. Historical experience tables are consulted to find the optimal scaling range (e.g., adding a few servers or a certain amount of resources) under similar load conditions. Scaling instructions are generated, and task migration paths are planned (which service instances are migrated to the new resources) to ensure business continuity and performance balance. Example: Assuming another server processing video transcoding, historical experience tables show that when CPU utilization exceeds 85% and load continues to rise, the optimal scaling strategy is to add two servers. The forecast for this server is that its CPU utilization is 90%, and the load continues to increase. Based on historical experience, scaling instructions are generated: add two servers; migrate some high-load tasks to the new servers to ensure the original server is not overloaded, while improving overall processing capacity; the migration path plans which service instances are migrated to which servers to avoid resource conflicts or performance degradation.

[0026] In an optional embodiment, before updating the experience values in the historical scheduling experience value table based on the current cluster topology information, the current service load information, and real-time feedback information, the method further includes: Obtain the current hardware resource configuration information and current service deployment information of the server cluster to obtain the current cluster topology information; the current cluster topology information includes the hierarchical structure of resource clusters, resource scaling groups, physical servers, and the connection information between each level and shared storage; Obtain real-time resource usage data and current service request queue information of the server cluster to obtain the current service load information.

[0027] It's important to note that cluster topology refers to the structure of a server cluster and how resources are allocated and managed at different levels. Understanding the cluster topology clarifies the structure of each physical server, resource group, and their relationships, thus revealing resource distribution and dependencies. A resource cluster is a group of servers sharing resources; they may be of the same type of hardware or clusters with specific task assignments. A resource scaling group refers to a set of servers that can elastically scale according to demand, typically used to handle load changes. The physical server hierarchy describes the location of physical servers within the cluster, potentially a high-level master cluster or subordinate sub-clusters, and their interrelationships. Connection information between each level and shared storage is crucial: different servers in the cluster may need to access shared storage resources; the topology information details how these storage resources are connected to ensure efficient service access. For example, suppose an enterprise's data center has multiple server clusters; each cluster has a set of master servers connected to a shared storage pool; some services may be deployed in a specific resource scaling group, automatically increasing or decreasing the number of servers based on load. By obtaining the current cluster topology information, we can understand which servers belong to the same resource scaling group and which servers are connected to the same storage pool, enabling rational planning of resource expansion and task migration. Service load information includes the actual resource usage of the service during runtime in the server cluster, such as CPU, memory, and storage usage, as well as the length of the service request queue. This information helps to assess the current service pressure in real time and whether resource adjustments or migration are needed. Real-time resource usage data refers to the current usage of server resources (such as CPU, memory, and disk). For example, a server may currently have high CPU utilization and low memory usage. Current service request queue information indicates the length of the current task queue that needs to be processed, such as the number of requests to be processed and service response time. If the request queue is too long, it means that the current server is overloaded and needs to be scaled up or the tasks migrated. Example: Suppose a server is running an order processing service for an e-commerce platform. Currently, it can be seen that the server's CPU utilization has reached 90%, while memory and disk usage are relatively low. At the same time, the service request queue is long, which means that more user requests need to be processed. If it continues to run on this server, it may affect the processing speed or cause latency. Therefore, by judging the current load through this real-time data, it is possible to decide whether resource expansion or task migration is needed. After acquiring real-time data on cluster topology and service load, this information can be used to update the historical scheduling experience value table. This table records the optimal scheduling strategies and resource utilization effects under different resource configurations in the cluster. These experience values are adjusted and optimized based on real-time feedback. The purpose of updating the historical experience value table is to assess whether the current resource configuration and historical experience values are still valid based on the current cluster topology and load. For example, if a server in a resource scaling group in the cluster frequently experiences resource shortages under a specific load, this may affect the optimal resource configuration in the historical scheduling experience. The scheduling strategy will be adjusted based on these changes.

[0028] In an optional embodiment, obtaining the current hardware resource configuration information and current service deployment information of the server cluster includes: The physical servers in the server cluster are divided according to whether they are connected to the same shared storage to obtain resource scaling groups; The resource scaling groups are divided according to whether they belong to the same geographical area and are interconnected with the network to obtain the corresponding resource clusters; The hardware structure of the server cluster is determined based on the hierarchical relationship between resource clusters, resource scaling groups, and physical servers.

[0029] It's important to note that a resource scaling group is a group of servers that can scale up or down together, typically sharing certain critical resources. The grouping is based on shared storage; that is, if several servers can access the same storage device (like a NAS or SAN storage), they are grouped together. The advantage of this is that when increased processing capacity is needed, servers can be flexibly added within the same scaling group without causing storage access conflicts. For example, a data center has six servers: S1, S2, S3, S4, S5, and S6. S1, S2, and S3 can all access storage device A; S4, S5, and S6 can all access storage device B. S1-S3 would be grouped into resource scaling group G1, and S4-S6 into resource scaling group G2. Thus, when order processing volume increases, server instances can be added to either G1 or G2, as they share the same storage, ensuring data consistency. Resource clusters are a higher-level organizational structure, typically composed of multiple resource scaling groups. The division is based on geographical location and network connectivity: scaling groups in the same region and with interconnected networks can form a cluster to ensure low latency and high data access efficiency. The advantages of this are: scaling groups within a cluster can communicate efficiently, facilitating task allocation and data synchronization. Example: Continuing the previous example, suppose: G1 is in a Beijing data center; G2 is in a Beijing data center; G3 is in a Shanghai data center. G1 and G2 in the Beijing data center have interconnected networks and low latency; G3 in the Shanghai data center forms a separate cluster. G1 and G2 will be assigned to resource cluster C1 (Beijing), and G3 will be assigned to resource cluster C2 (Shanghai). Thus, when a user in Beijing makes a request, scheduling prioritizes task allocation within C1, reducing cross-regional transmission latency. The hardware architecture is a complete hierarchical view of physical servers, scaling groups, and clusters: Bottom layer: physical servers; Middle layer: resource scaling groups (shared storage, scalable); Top layer: resource clusters (geographical region, network connectivity). This hierarchy clearly shows the location of each server, its scaling group, its cluster, and the relationships between them. Example: Beijing cluster C1; Scaling group G1: S1, S2, S3 (shared storage A); Scaling group G2: S4, S5, S6 (shared storage B); Shanghai cluster C2; Scaling group G3: S7, S8 (shared storage C). With this hardware architecture, scheduling can make the following decisions: When order processing requests in Beijing increase, prioritize scheduling servers in C1; when resources in C1 are strained, consider cross-cluster scheduling to C2; when a storage device becomes a bottleneck, tasks can be distributed within other scaling groups without affecting data consistency.

[0030] In an optional embodiment, the preset execution constraints include a first adaptation rule constructed based on the matching relationship between the current server cluster's resource expansion demand and the cluster resource threshold dynamically generated based on a preset elastic target; a second adaptation rule based on a preset elastic target setting to schedule related processes of the same type of service and service instances with data synchronization needs to the same resource scaling group; and a third adaptation rule based on a preset elastic target setting to schedule service instances to the physical server with the highest resource idle rate in the resource scaling group. Based on historical scheduling experience values and preset execution constraints, determine the flexible scheduling actions for the current scheduling period, including: Determine the cluster selection experience value corresponding to the current scheduling period from the historical scheduling experience value table, so as to select the target resource cluster that meets the first adaptation rule from the candidate resource clusters based on the cluster selection experience value and use it to perform the expansion action; Determine the scaling group selection experience value corresponding to the current scheduling period in the historical scheduling experience value table, and select the target scaling group that meets the second adaptation rule from each resource scaling group in the target resource cluster based on the scaling group selection experience value; Determine the server selection experience value corresponding to the current scheduling period from the historical scheduling experience value table, and select physical servers that meet the third adaptation rule from each physical server in the target scaling group based on the server selection experience value.

[0031] It should be noted that several scheduling rules are set in advance to guide scheduling decisions; they are all based on preset elasticity targets, that is, strategies for expanding or reducing resources under what circumstances. Based on the current expansion needs and preset goals of the cluster, determine which cluster is most suitable for performing the expansion operation; for example: the current CPU utilization of Beijing cluster C1 is 90%, and C2 is 60%; the elastic goal is to expand when the CPU utilization exceeds 80%; C1 will be selected for expansion first because C1's needs best match the goal. To reduce latency and data transmission costs, related processes of similar services or service instances that require frequent data synchronization are grouped into the same scaling group. For example, if there are two database service instances, DB1 and DB2, that need to synchronize data, DB1 and DB2 are scheduled into the same scaling group G1, instead of being distributed across different scaling groups. Similarly, similar web service processes are also grouped together in one scaling group as much as possible. When selecting physical servers within a scaling group, prioritize servers with the least available resources to improve utilization efficiency. For example, G1 contains three servers: S1, S2, and S3. S1 has a CPU idle rate of 50%, S2 has 30%, and S3 has 70%. New service instances will be scheduled to S3 because it is the least available. Instead of making blind choices, we refer to historical experience: in past scheduling, which clusters, scaling groups and servers were more effective under different conditions; First, select the clusters that meet the first adaptation rule (scaling requirement matching) from the candidate clusters, and then combine historical experience values to determine which cluster is most suitable; Example: Candidate clusters: C1 (Beijing), C2 (Shanghai); Scaling requirement matching: C1 meets the requirement, C2 is not urgent; Historical experience: During past peak periods, the Beijing cluster had a fast scaling response and high efficiency; Result: C1 is selected as the target cluster for scaling. Within the selected cluster, choose the scaling group most suitable for hosting the new service instance, following the second adaptation rule (service relevance and data synchronization requirements). Example: C1 has G1 and G2; G1 contains relevant database and web service instances, and data synchronization efficiency is high; historical experience: G1 has stable performance under high load; result: G1 is selected as the target scaling group. Within the scaling group, select the most suitable server, following the third adaptation rule (highest resource idle rate); Example: G1 contains S1, S2, and S3; S1 idle rate is 50%, S2 is 30%, and S3 is 70%; Historical experience: S3 has a fast scaling response and good stability; Result: Schedule new instances to S3.

[0032] In an optional embodiment, determining the cluster selection experience value corresponding to the current scheduling period in the historical scheduling experience value table, and selecting the target resource cluster for performing the expansion action from the candidate resource clusters based on the cluster selection experience value and satisfying the first adaptation rule, includes: Obtain the resource expansion demand matching value of each candidate resource cluster recorded in the current service load information; Select candidate resource clusters whose resource expansion demand matching values meet the cluster resource threshold range as initial candidate clusters; Determine whether there are resource clusters with existing service deployments among the initial candidate clusters; If there is only one resource cluster with existing service deployment in the initial candidate clusters, then the resource cluster with existing service deployment will be determined as the target resource cluster for performing the expansion action. If there are multiple resource clusters with existing service deployments in the initial candidate clusters, the resource cluster with the lowest latency in the regional network from which the service request originates will be determined as the target resource cluster for performing the expansion action.

[0033] It should be noted that checking the current service load helps determine which resource clusters have the best scaling requirements to match the current service load. For example, suppose there are three clusters (C1, C2, and C3) with scaling requirements of 80%, 90%, and 70%, respectively; the current service load requirement is 85% (e.g., the CPU utilization of service instances reaches 85%), and it is necessary to find the cluster whose load requirement best matches this requirement. Clusters within a reasonable range; for example: suppose the preset resource expansion demand range is between 80% and 90%; within this range, C1 (80%) and C2 (90%) both meet the conditions, so they will become the initial candidate clusters; C3 (70%) does not meet the conditions, so it is excluded; Check if the corresponding service instances have already been deployed in the candidate clusters; if not, further screening may be necessary; if so, consider them directly. For example, in C1 and C2, assume that C1 is a cluster with services already deployed, while C2 is a cluster without any services deployed. Although C2 meets the scaling requirements, it has no service instances, so it is temporarily excluded; C1 becomes the initially selected candidate cluster. If the initial candidate clusters contain only one cluster with existing service deployment, then that cluster will be directly selected as the expansion target. For example, currently, C1 is the only cluster with existing service deployment; therefore, C1 will be directly selected as the target cluster for expansion. If multiple clusters in the initial candidate clusters already have service deployments, the cluster with the lowest network latency relative to the service request's origin region will be selected as the target cluster. For example, if both C1 and C2 have service instances deployed, but C1 is located in a local data center while C2 is located in an overseas data center, their network latencies will differ. Suppose the latency from the current service request's origin region (e.g., a city) to C1 is 50ms, and the latency to C2 is 150ms. C1 will be selected as the target cluster because its lower network latency allows it to process service requests more efficiently.

[0034] In an optional embodiment, determining whether there exists an optimal experience value in the updated historical scheduling experience value table that satisfies the preset execution constraints for the next scheduling cycle includes: Based on the actual values of each state parameter within the fluctuation range of each elastic scheduling action, the elastic scheduling actions that can be executed in the current scheduling period are determined to construct a candidate action set; the fluctuation range of the state parameters is the allowable fluctuation range of the parameters corresponding to the current cluster topology information and the current service running load information in the current scheduling period. Obtain the experience value corresponding to each action in the candidate action set from the historical scheduling experience value table to obtain the experience value set corresponding to the current scheduling cycle; The optimal empirical value corresponding to the current scheduling cycle is determined from the empirical value set according to the maximum value filtering rule; If the reward function value corresponding to the optimal experience value is greater than the preset threshold, then it is determined that there exists an optimal experience value that satisfies the preset execution constraints of the next scheduling cycle.

[0035] It's important to note that the current cluster status and service load determine which elastic scheduling actions can be executed. Elastic scheduling actions can be understood as methods of adjusting resources, such as adding server instances, migrating services, and adjusting load balancing. The fluctuation range of status parameters refers to the allowed range of change, such as CPU utilization being allowed between 70% and 90%, memory usage between 60% and 85%, or the number of nodes allowed to be added or removed in the cluster topology. For example, suppose a service is currently running on cluster C1 with 85% CPU utilization and 80% memory utilization; CPU utilization is allowed to fluctuate between 80% and 90%, and memory usage between 75% and 85%. Executable elastic scheduling actions include: Action A: Add one server; Action B: Migrate part of the load to C2; Action C: Do not expand for now. These actions are all within the allowed fluctuation range, so they can all be included in the candidate action set. By reviewing historical data, we can identify the effects of past actions, which are known as experience values. Experience values reflect how well an action performed. For example: Action A (adding servers): historical experience value = 85; Action B (migrating load): historical experience value = 70; Action C (not expanding capacity): historical experience value = 60. Compare the experience values of the candidate actions and find the action with the highest value; this action is the optimal action because it has performed best historically; for example, in the example above, action A has the highest experience value (85), so its corresponding optimal experience value is 85. It checks whether the reward function value corresponding to the optimal experience value (which can be understood as the expected effect of the action) exceeds the preset threshold. If it does, it means that the action can be safely executed in the next scheduling cycle. For example: preset threshold = 80; optimal experience value = 85; 85 > 80, so it is determined that there is an optimal action that meets the execution constraints of the next scheduling cycle.

[0036] The embodiments of the present invention have been described in detail above with reference to the accompanying drawings. However, the present invention is not limited thereto. Various changes can be made within the scope of knowledge possessed by those skilled in the art without departing from the spirit of the present invention.

Claims

1. A server cluster elastic resource scheduling method based on reinforcement learning, characterized in that, include: Construct preset execution constraints for cluster elastic resource scheduling; The flexible scheduling action for the current scheduling cycle is determined based on the historical scheduling experience value table and preset execution constraints. Acquire cluster monitoring data of the target resource cluster, target scaling group, and target physical server; extract static features, dynamic features, task features, and historical features of the target resource unit from the cluster monitoring data to obtain a feature set; The feature set is input into the resource scheduling reinforcement learning model to obtain the resource adjustment prediction result of the target physical server; based on the resource adjustment prediction result and the historical scheduling experience value table, the final elastic scaling instruction for the target physical server is generated. The system obtains real-time feedback information after the server cluster management system executes the final elastic scaling command by using a preset reward function, and updates the current cluster topology information and current service load information of the server cluster management system. Update the historical scheduling experience value table based on the current cluster topology information, the current service load information, and real-time feedback information. Determine whether there is an optimal experience value in the updated historical scheduling experience value table that meets the preset execution constraints of the next scheduling cycle; if there is an optimal experience value that meets the conditions, then execute the elastic resource scheduling of the next scheduling cycle based on the optimal experience value and the real-time cluster monitoring data of the next scheduling cycle.

2. The server cluster elastic resource scheduling method based on reinforcement learning according to claim 1, characterized in that, The cluster monitoring data includes resource configuration data and real-time performance data obtained by probes deployed in the server cluster to collect data from the target resource units in real time. The static features, dynamic features, task features, and historical features of the target resource unit are extracted from the cluster monitoring data to obtain a feature set, including: The hardware list in the resource configuration data is input into the trained hardware recognition model to obtain the output hardware parameter recognition results, and the CPU architecture, memory capacity and disk type are determined as static features based on the hardware parameters. Based on the time-series stream of the real-time performance data, the current load, I / O rate, and network throughput are determined as dynamic features. Input the task description information in the queue of tasks to be processed into the trained task classification model, and obtain the task type and task priority as the task features. Extract the task dependencies from the task scheduling logs in the real-time performance data as task features; Historical load curves and resource adjustment strategies for different historical periods are extracted from the historical scaling records as historical features.

3. The server cluster elastic resource scheduling method based on reinforcement learning according to claim 2, characterized in that, The resource scheduling reinforcement learning model includes a graph attention network module, a long short-term memory network module, and a policy gradient module; the policy gradient module includes an actor network sub-model and a critic network sub-model; the resource adjustment prediction results include predicted values to characterize resource utilization, the recommended scaling range, and changes in expected response time; The feature set is input into the resource scheduling reinforcement learning model to obtain the resource adjustment prediction result of the target physical server, including: The feature set is respectively input into the graph attention network module, the long short-term memory network module, and the policy gradient module; The graph attention network module is used to extract resource topology association results based on the topology of the server cluster; The long short-term memory network module is used to extract load trend prediction results based on historical features; The strategy gradient module uses dynamic features and task features to output a value assessment result for scaling actions; The resource adjustment prediction result is generated by integrating the resource topology association result, the load trend prediction result, and the scaling action value assessment result.

4. The server cluster elastic resource scheduling method based on reinforcement learning according to claim 3, characterized in that, The preset reward function includes cluster selection reward, grouping reward, and deployment reward; The real-time feedback information obtained by the server cluster management system after executing the final elastic scaling command is obtained through a preset reward function, including: When performing a capacity expansion operation, obtain the cluster resource redundancy and network latency data after selecting the target resource cluster; If the cluster resource redundancy is minimized while meeting peak service demand and network latency is minimized, a positive reward is generated. When performing grouping actions, obtain the status of associated processes and the data synchronization requirements of service instances within the target scaling group; if the associated processes of the same type of service and the service instances with data synchronization requirements are located in the same scaling group, and the resource contention instances are located in different scaling groups, then a positive reward is generated. When performing a deployment action, obtain the resource capacity limit of the target physical server and the number of physical servers launched; if it does not exceed the physical server resource capacity limit and the number of physical servers launched in the scaling group is the minimum, a positive reward is generated.

5. The server cluster elastic resource scheduling method based on reinforcement learning according to claim 4, characterized in that, Based on the resource adjustment prediction results and the historical scheduling experience value table, a final elastic scaling instruction is generated for the target physical server, including: If the resource adjustment prediction result indicates that the resource utilization rate is lower than the threshold in the historical scheduling experience value table and the load continues to decrease, then the current resource configuration is maintained. If the resource adjustment prediction result indicates that the resource utilization rate is higher than the threshold in the historical scheduling experience value table and the load continues to increase, then an expansion instruction containing task migration paths is generated by combining the optimal scaling range in the historical scheduling experience value table.

6. The server cluster elastic resource scheduling method based on reinforcement learning according to claim 5, characterized in that, Before updating the historical scheduling experience value table based on the current cluster topology information, current service load information, and real-time feedback information, the method further includes: Obtain the current hardware resource configuration information and current service deployment information of the server cluster to obtain the current cluster topology information; the current cluster topology information includes the hierarchical structure of resource clusters, resource scaling groups, physical servers, and the connection information between each level and shared storage; Obtain real-time resource usage data and current service request queue information of the server cluster to obtain current service load information.

7. The server cluster elastic resource scheduling method based on reinforcement learning according to claim 6, characterized in that, Obtain the current hardware resource configuration information and current service deployment information of the server cluster, including: The physical servers in the server cluster are divided according to whether they are connected to the same shared storage to obtain resource scaling groups; The resource scaling groups are divided according to whether they belong to the same geographical region and are interconnected with the network to obtain the corresponding resource clusters; The hardware structure of the server cluster is determined based on the hierarchical relationship between the resource cluster, the resource scaling group, and the physical server.

8. The server cluster elastic resource scheduling method based on reinforcement learning according to claim 7, characterized in that, The preset execution constraints include a first adaptation rule constructed based on the matching relationship between the current server cluster's resource expansion demand and the cluster resource threshold dynamically generated based on preset elastic targets; a second adaptation rule based on preset elastic targets that schedules associated processes of the same type of service and service instances with data synchronization needs to the same resource scaling group; and a third adaptation rule based on preset elastic targets that schedules service instances to the physical server with the highest resource idle rate in the resource scaling group. Based on historical scheduling experience values and preset execution constraints, determine the flexible scheduling actions for the current scheduling period, including: Determine the cluster selection experience value corresponding to the current scheduling period in the historical scheduling experience value table, so as to select the target resource cluster that satisfies the first adaptation rule from the candidate resource clusters based on the cluster selection experience value and is used to perform the expansion action; Determine the scaling group selection experience value corresponding to the current scheduling period in the historical scheduling experience value table, so as to select a target scaling group that satisfies the second adaptation rule from each of the resource scaling groups in the target resource cluster based on the scaling group selection experience value; Determine the server selection experience value corresponding to the current scheduling period in the historical scheduling experience value table, and select a physical server that meets the third adaptation rule from each physical server in the target scaling group based on the server selection experience value.

9. A server cluster elastic resource scheduling method based on reinforcement learning according to claim 8, characterized in that, Determine the cluster selection experience value corresponding to the current scheduling period in the historical scheduling experience value table, and select the target resource cluster for performing the expansion action from the candidate resource clusters based on the cluster selection experience value, which satisfies the first adaptation rule. Obtain the resource expansion demand matching value of each candidate resource cluster recorded in the current service operation load information; Select candidate resource clusters whose resource expansion demand matching values meet the cluster resource threshold range as initial candidate clusters; Determine whether there are any resource clusters with existing service deployments among the initial candidate clusters; If there is only one resource cluster with existing service deployment in the initial candidate clusters, then the resource cluster with existing service deployment is determined as the target resource cluster for performing the expansion action. If there are multiple resource clusters with existing service deployments in the initial candidate clusters, the resource cluster with the lowest latency in the service request source area network will be determined as the target resource cluster for performing the expansion action.

10. A server cluster elastic resource scheduling method based on reinforcement learning according to claim 9, characterized in that, Determine whether the updated historical scheduling experience value table contains an optimal experience value that satisfies the preset execution constraints for the next scheduling cycle, including: Based on the actual values of each state parameter within the fluctuation range of each elastic scheduling action, the elastic scheduling actions that can be executed in the current scheduling cycle are determined to construct a candidate action set; the fluctuation range of the state parameters is the allowable fluctuation range of parameters corresponding to the current cluster topology information and the current service running load information in the current scheduling cycle. Obtain the experience value corresponding to each action in the candidate action set from the historical scheduling experience value table to obtain the experience value set corresponding to the current scheduling cycle; The optimal empirical value corresponding to the current scheduling cycle is determined from the set of empirical values according to the maximum value filtering rule; If the reward function value corresponding to the optimal experience value is greater than the preset threshold, then it is determined that there exists an optimal experience value that satisfies the preset execution constraints of the next scheduling cycle.