Methods and apparatus for adjusting container copies, storage media and electronic devices

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By acquiring service status and using intelligent prediction models and multi-level constraints to adjust the number of container replicas, the problem of frequent jitter in traditional container scaling strategies is solved, thereby improving system stability and resource efficiency in online business scenarios.

CN122309030APending Publication Date: 2026-06-30HUNAN HAPPLY SUNSHINE INTERACTIVE ENTERTAINMENT MEDIA CO LTD

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: HUNAN HAPPLY SUNSHINE INTERACTIVE ENTERTAINMENT MEDIA CO LTD
Filing Date: 2026-03-30
Publication Date: 2026-06-30

Application Information

Patent Timeline

30 Mar 2026

Application

30 Jun 2026

Publication

CN122309030A

IPC: G06F9/455

AI Tagging

Technology Topics

Parallel computingOnline business

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Dual resource pool hierarchical resource scheduling method and device, equipment and medium
CN122261851AProgram initiation/switching Resource allocationExpiration TimeIdle time
Method, system and related components for adjusting the number of tolerable container groups of a container cluster
CN115328652BMemory footprintParallel computing
Data processing method and device, equipment and computer readable storage medium
CN116486033Bsmall amount of calculation Improve computing efficiency Computer hardware Parallel computing
Matrix operation processor for quantum-resistant cryptographic algorithms and data processing method
CN122286059AParallel computingQuantum codes
Load balancing based multi-core CPU task scheduling optimization method
CN121433899BResource allocation Hardware monitoringProcessing coreParallel computing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

Smart Images

Figure CN122309030A_ABST

Patent Text Reader

Abstract

This application discloses a method, apparatus, storage medium, and electronic device for adjusting container replicas. The method includes: obtaining the current state of each service used to carry a target business in the current period, the current state including the current number of container replicas, resource usage information, and response latency; obtaining the predicted number of container replicas in the next period by inputting the current multi-dimensional state vector corresponding to the current state into a target prediction model; performing a compliance check on the predicted number of container replicas based on multi-level constraints, the predicted number of container replicas indicating pending actions in the next period, the pending actions representing scaling up or down the number of container replicas; and adjusting the number of container replicas for each service in the next period from the predicted number to the target number of container replicas based on the compliance check results. This application solves the technical problem of poor stability of business systems caused by container scaling up or down in online business scenarios.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of cloud computing and container orchestration technology, and more specifically, to a method and apparatus for adjusting container copies, a storage medium, and an electronic device. Background Technology

[0002] In related technologies, container scaling mainly relies on mechanisms such as horizontal scaling, vertical scaling, and node-level elasticity. Taking Kubernetes as an example, the system monitors CPU, memory, or custom metrics (such as QPS and response latency), and triggers the increase or decrease of the number of Pod replicas when the metrics exceed preset thresholds; vertical scaling dynamically adjusts container resource requests and limits; and node-level elasticity automatically scales up or down the cluster nodes when resources are insufficient.

[0003] However, since the aforementioned container scaling strategies based on static thresholds or simple rules are mostly one-off changes during implementation, they lack fine-grained control over the magnitude, frequency, and execution phase of these changes. Therefore, traditional scaling strategies struggle to accurately predict sudden traffic spikes and periodic fluctuations in scenarios such as game services, leading to frequent adjustments in the number of container replicas. This exacerbates frequent scaling fluctuations, resulting in poor system stability in online business scenarios due to container scaling.

[0004] There is currently no effective solution to the above problems. Summary of the Invention

[0005] This application provides a method and apparatus for adjusting container copies, a storage medium, and an electronic device to at least solve the technical problem of poor stability of business systems caused by container scaling in online business scenarios.

[0006] According to one aspect of the embodiments of this application, a method for adjusting container replicas is provided, comprising: obtaining the current state of each service used to carry a target business in the current period, wherein the current state includes the current number of container replicas, resource usage information, and response latency; obtaining the predicted number of container replicas in the next period by inputting the current multi-dimensional state vector corresponding to the current state into a target prediction model; performing a compliance check on the predicted number of container replicas based on multi-level constraints, wherein the predicted number of container replicas is used to indicate the actions to be executed in the next period, and the actions to be executed represent scaling up or scaling down the number of container replicas; and adjusting the number of container replicas of each service in the next period from the predicted number of container replicas to the target number of container replicas based on the compliance check results.

[0007] According to another aspect of the embodiments of this application, a container replica adjustment device is also provided, comprising: a first acquisition unit, configured to acquire the current state of each service used to carry the target business in the current period, wherein the current state includes the current number of container replicas, resource usage information, and response latency; a first processing unit, configured to obtain the predicted number of container replicas in the next period by inputting the current multi-dimensional state vector corresponding to the current state into the target prediction model; a second processing unit, configured to perform a compliance check on the predicted number of container replicas based on multi-level constraints, wherein the predicted number of container replicas is used to indicate the actions to be executed in the next period, and the actions to be executed represent expanding or shrinking the number of container replicas; and a first adjustment unit, configured to adjust the number of container replicas of each service in the next period from the predicted number of container replicas to the target number of container replicas based on the compliance check results.

[0008] According to another aspect of the embodiments of this application, a computer-readable storage medium is also provided, wherein a computer program is stored in the computer program for executing the above-described method for adjusting the container copy when the electronic device is run.

[0009] According to another aspect of the embodiments of this application, a computer program product is also provided, including a computer program that, when executed by a processor, implements the steps of the above-described method.

[0010] According to another aspect of the embodiments of this application, an electronic device is also provided, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the adjustment method of the container copy through the computer program.

[0011] By employing the embodiments provided in this application, the current state of each service is obtained at a finer granular level, and the corresponding multi-dimensional state vector is input into the intelligent prediction model. This enables accurate identification of load fluctuation information for each service within different time shards (i.e., different periods). Simultaneously, combined with multi-level constraints, it effectively filters out aggressive changes that do not meet service quality requirements or are unreasonable. Finally, based on the compliance check results, the predicted container replica count is adjusted, transforming candidate actions into executable actions after constraints, thus avoiding meaningless and high-risk frequent changes. In other words, by combining intelligent decision-making by the prediction model with compliance checks of multi-level constraints, candidate scaling actions are filtered into a safe and controllable target container replica count, thereby suppressing frequent fluctuations in the container replica count. This not only reduces the frequency and amplitude of scaling fluctuations but also improves SLO compliance rate and system stability, achieving minimal resource consumption while ensuring service quality. Attached Figure Description

[0012] The accompanying drawings, which are provided to further illustrate this application and form part of this application, illustrate exemplary embodiments of this application and are used to explain this application, but do not constitute an undue limitation of this application.

[0013] Figure 1 This is a schematic diagram illustrating an application scenario of an optional method for adjusting container copies according to an embodiment of this application;

[0014] Figure 2 This is a flowchart of an optional method for adjusting a container copy according to an embodiment of this application;

[0015] Figure 3 This is an overall schematic diagram of an optional method for adjusting a container copy according to an embodiment of this application;

[0016] Figure 4 This is a schematic diagram of an optional container copy adjustment device according to an embodiment of this application;

[0017] Figure 5 This is a schematic diagram of the structure of an optional electronic device according to an embodiment of this application. Detailed Implementation

[0018] To enable those skilled in the art to better understand the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present application, and not all embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative effort should fall within the scope of protection of the present application.

[0019] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this application are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of this application described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0020] The technical solutions in this application will comply with legal regulations during implementation. When operating according to the technical solutions in the embodiments, the data used will not involve user privacy, ensuring that the operation process is compliant and legal while guaranteeing data security. In addition, when the above embodiments of this application are applied to specific products or technologies, user permission or consent is required, and the collection, use, and processing of related data must comply with the relevant regulations and standards of the relevant countries or regions.

[0021] According to one aspect of the embodiments of this application, a method for adjusting a container copy is provided. As an optional implementation, the above-described method for adjusting a container copy can be applied, but is not limited to, to applications such as... Figure 1 The application scenarios shown are as follows. In, for example... Figure 1 In the application scenario shown, the target terminal 102 can communicate with the server 106 via network 104, but is not limited to this. The server 106 can perform operations on the database 108, such as write or read data operations. The target terminal 102 may include, but is not limited to, a human-computer interaction screen, a processor, and a memory. The human-computer interaction screen may be used to display business screens rendered using the various services in this application's technical solution on the target terminal 102, such as game screens. The processor may be used to respond to the human-computer interaction operations, execute corresponding operations, or generate corresponding instructions and send the generated instructions to the server 106. The memory is used to store relevant processing data, such as the current number of container replicas, the current multidimensional state vector, and the target container replica count.

[0022] Optionally, in this embodiment, the target terminal can be a terminal configured with a target client, which may include, but is not limited to, at least one of the following: mobile phone (such as Android phone, iOS phone, etc.), laptop computer, tablet computer, PDA, MID (Mobile Internet Devices), PAD, desktop computer, smart TV, etc. The target client may be a video client, instant messaging client, browser client, educational client, etc. The network may include, but is not limited to, wired network and wireless network, wherein the wired network includes: local area network, metropolitan area network and wide area network, and the wireless network includes: Bluetooth, WIFI and other networks that enable wireless communication. The server may be a single server, a server cluster composed of multiple servers, or a cloud server.

[0023] The technical solution presented in this application can be widely applied to containerized online business scenarios with high requirements for service stability, resource costs, and corresponding performance coordination. It is particularly suitable for systems with periodic, volatile, and multidimensionally sensitive loads. Examples include online video live streaming and on-demand platforms, online game battle services, and the core link of the Dazu online transaction platform.

[0024] As described in the above embodiments, when applying traditional container scaling methods to online business scenarios, frequent scaling fluctuations easily occur, leading to poor stability of the business system. To address this issue, this application proposes a method for adjusting container replicas. Figure 2 This is a flowchart of a method for adjusting a container copy according to an embodiment of this application, which includes the following steps S202 to S208.

[0025] It should be noted that the method for adjusting the container copy shown in steps S202 to S208 can be, but is not limited to, performed by an electronic device, wherein the electronic device can be, but is not limited to, as shown in steps S202 to S208. Figure 1 The target terminal or server shown.

[0026] Step S202: Obtain the current status of each service used to carry the target business in the current period, wherein the current status includes the current number of container replicas, resource usage information and response latency;

[0027] Step S204: By inputting the current multidimensional state vector corresponding to the current state into the target prediction model, the number of prediction container replicas in the next cycle is obtained;

[0028] Step S206: Based on multi-level constraints, perform a compliance check on the predicted number of container replicas. The predicted number of container replicas is used to indicate the actions to be executed in the next cycle. The actions to be executed represent expanding or shrinking the number of container replicas.

[0029] Step S208: Based on the compliance check results, adjust the number of container replicas for each service in the next cycle from the predicted number of container replicas to the target number of container replicas.

[0030] Before explaining the technical solution in this embodiment, we will first briefly introduce the meaning of technical terms such as container and container copy number.

[0031] A container is a lightweight virtualized runtime unit based on containerization technology, designed to run application service instances on a container orchestration and management platform. A container is an abstraction of a service instance. A service instance refers to an independently running containerized application process unit that carries a specific business function, providing independent service capabilities for that function. High availability, high concurrency, and load balancing are achieved through the parallel operation of multiple instances. Essentially, a service instance is an extended execution body of business logic, responsible for receiving requests, processing data, invoking dependent services, and returning responses. Multiple identical service instances together form a "service cluster," providing consistent service capabilities to the outside world.

[0032] The relationship between container replicas and service resources can, but is not limited to, changing the number of container replicas directly changing the total number of service instances running in the system, thereby linearly or nearly linearly changing the total consumption of resources such as CPU, memory, network, and storage.

[0033] A service instance can refer to, but is not limited to, a running entity capable of independently providing service functions. Externally, it appears as an accessible service node. Each Pod typically hosts one service instance; therefore, the number of container replicas directly determines the number of service instances. In other words, as the number of container replicas increases, the number of service instances also increases, allowing the system (such as a game system) to handle more player requests simultaneously.

[0034] In this embodiment, a container elastic scaling method based on constraint awareness and backtracking correction is provided. Its core lies in building an intelligent, secure, and backtrackable closed-loop control mechanism to achieve accurate, stable, and low-cost elastic scheduling of containerized business systems.

[0035] The aforementioned acquisition of the current state of each service within the current period can refer to, but is not limited to, the system (taking a game system as an example) continuously collecting operational metrics of each service from the cluster monitoring system. These metrics include not only basic data such as the number of container replicas, CPU and memory usage, but also critical lifecycle data at the service level, such as response latency, error rate, request queue length, and instance creation and readiness time. For example, in a game business, the operational state of each "room server" service might include the number of currently online players, match queue length, number of sessions per instance, and response latency of critical links. This data is collected uniformly and aggregated according to fixed time windows (e.g., every 10 minutes) to form a current state that reflects service load trends and user experience quality. In other words, by using time shards as units, each current state is no longer a mere accumulation of single metrics, but a multi-dimensional comprehensive representation integrating performance, stability, and resource consumption, and is a real-time state dynamically updated over time, providing a true and reliable input basis for subsequent decision-making.

[0036] After obtaining the current state of each service within the current period, the corresponding multi-dimensional state vector is input into the target prediction model (such as a reinforcement learning model). This model learns from the historical data of each service to predict whether to scale up or down in the current state. The model does not simply scale up based on CPU exceeding a threshold, but considers multiple dimensions of data. For example, the model might identify that "although the CPU is not overloaded, the response latency is continuously increasing and jitter is aggravated," thus determining that scaling up should be done in the next period adjacent to the current period; or it might identify that "it is currently the early morning off-peak period, and even if the number of replicas is slightly high, it is not advisable to immediately scale down, as the warm-up time may affect the morning surge in traffic." The model outputs scores for multiple candidate actions; for example, "adding 3 replicas" scores 0.85, "keeping it unchanged" scores 0.6, and "reducing 2 replicas" scores 0.2. The system will select the action with the highest score as the prediction target.

[0037] The aforementioned target prediction model can be used, but is not limited to, to determine the score of each candidate action in the candidate action set for each service based on the current state, where each candidate action represents an operation to expand or shrink the number of container replicas.

[0038] It is important to note that when the model confidence is insufficient (e.g., the difference between the scores of the best and second-best actions is too small) or the model fails, the system automatically degenerates into a conservative strategy, such as "keeping it unchanged", to ensure that the system does not malfunction due to algorithm misjudgment.

[0039] The aforementioned compliance checks on the predicted number of container replicas, based on multi-level constraints, are key to the automatic scaling-up and scaling-down mechanism implemented in this application. The system does not unconditionally execute model recommendations but incorporates multi-layered security filtering strategies, including but not limited to whether resource quotas are sufficient, whether nodes have scheduling capabilities, whether the system is currently in a release window, whether it is in a cooling-off period (having just performed scaling-up and needs a period of silence), and whether there is a risk of SLO default (such as latency approaching a threshold). For example, if the model suggests increasing the number of container replicas from 10 to 25, but the current service cluster nodes are already at full capacity, the system will mark it as "resource verification failed." If scaling-down is suggested, but it coincides with a peak evening traffic period and an SLO risk warning is triggered, the system will forcibly retain the current number of replicas to avoid deteriorating user experience. This layered constraint or filtering mechanism ensures that every change (to the number of container replicas) is legal, secure, and implementable, rather than blindly pursuing automation.

[0040] After performing compliance checks on the predicted container replica count output by the model according to the aforementioned multi-level constraints, the predicted container replica count is adjusted to ensure that the final action (expansion or reduction operation) is the "optimal feasible solution" after multiple security checks. Even if the model recommends the best action, if it is blocked by constraints, the system will downgrade to the container replica count of the previous period (such as the current period before the next period); or allow a small expansion in SLO emergency situations, but not allow exceeding the limit. This process forms a closed loop of "prediction-constraint-correction," ensuring that every operation is both intelligent and safe. This adjustment process is also linked to the subsequent execution orchestration and backtracking correction modules. For example, if the execution according to the adjusted target container replica count fails or the indicators deteriorate during the backtracking process, the system will roll back or freeze, and record this decision-making process as training data for continuous optimization of the target prediction model.

[0041] In this embodiment, a complete intelligent elastic decision chain integrating perception, prediction, constraint, and execution is constructed, which breaks through the coarseness and lag of traditional static threshold-triggered container scaling and realizes highly reliable container scheduling based on business semantics, multi-dimensional constraints, security throttling, and continuous learning, providing a stable, efficient, and low-risk automated operation and maintenance method for complex online services.

[0042] By employing the embodiments provided in this application, the current state of each service is obtained at a finer granular level, and the corresponding multi-dimensional state vector is input into the intelligent prediction model. This enables accurate identification of load fluctuation information for each service within different time shards (i.e., different periods). Simultaneously, combined with multi-level constraints, it effectively filters out aggressive changes that do not meet service quality requirements or are unreasonable. Finally, based on the compliance check results, the predicted container replica count is adjusted, transforming candidate actions into executable actions after constraints, thus avoiding meaningless and high-risk frequent changes. In other words, by combining intelligent decision-making by the prediction model with compliance checks of multi-level constraints, candidate scaling actions are filtered into a safe and controllable target container replica count, thereby suppressing frequent fluctuations in the container replica count. This not only reduces the frequency and amplitude of scaling fluctuations but also improves SLO compliance rate and system stability, achieving minimal resource consumption while ensuring service quality.

[0043] As an optional example, the above compliance check on the predicted number of container replicas, based on multi-level constraints, includes:

[0044] Based on the priority of each constraint, at least one of the following constraints shall be applied to the predicted number of container replicas: replica number range constraint, single change step size constraint, cooling suppression constraint, and resource feasibility constraint.

[0045] Among them, the replica number range constraint is used to ensure that the predicted container replica number is within the preset upper and lower limits of the replica number; the single change step size constraint is used to limit the single adjustment range from not exceeding the preset step size threshold; the cooling suppression constraint is used to block non-urgent changes when each service is in the protection period; and the resource feasibility constraint is used to verify whether the predicted container replica number can be actually deployed at the node resource, scheduling strategy and dependent component levels.

[0046] The above-mentioned application of at least one of the following constraints to the predicted number of container replicas—including but not limited to applying the following constraints in sequence based on the priority of each constraint: replica number range constraint, single change step size constraint, cooling suppression constraint, and resource feasibility constraint; or applying the following constraints in sequence: replica number range constraint, single change step size constraint, and cooling suppression constraint; or applying the following constraints in sequence: single change step size constraint, cooling suppression constraint, and resource feasibility constraint.

[0047] In the process of executing multiple constraints, the state after the previous constraint is executed serves as the condition for the execution of the next constraint, and there is a correlation between the execution of adjacent constraints.

[0048] This embodiment further refines the specific implementation method of performing compliance checks on the predicted number of container replicas based on multi-level constraints. It clarifies the execution order of constraint processing and the specific role of each layer of constraints, and constructs a structured constraint filtering pipeline with priority logic to ensure that each scaling up and down operation is executed under safe and controllable conditions.

[0049] First, the replica count range constraint can, but is not limited to, requiring the system to enforce a minimum and maximum replica count for containers, preventing the replica count from reaching zero or far exceeding the cluster's capacity due to model misjudgments. For example, in a game server scenario, the system sets a minimum replica count of 2 (to ensure basic service availability) and a maximum of 50 (limited by the total number of node resources). If the target prediction model suggests scaling up to 60, the system will directly reduce it to 50 to ensure no cluster-level resource exhaustion risk is triggered. This constraint is a prerequisite for all subsequent processing and constitutes a fundamental safety boundary.

[0050] Secondly, the single-change step size constraint is used to limit the magnitude of single scaling up or down, preventing service instability or scheduling avalanche caused by drastic changes. For example, even if the model indicates that the number of replicas should increase directly from 10 to 30, the system still limits the increase to a maximum of 5 replicas at a time, requiring 6 steps to complete. This allows sufficient time for container creation, warm-up, and network access, reducing the risk of sudden spikes in response latency or service overload caused by instantaneous loading. This mechanism is particularly crucial in high-concurrency applications, significantly improving the smoothness of changes.

[0051] Furthermore, the cooling-off constraint introduces a "post-change quiet period" logic. For example, after a system has just completed a scaling up, any scaling down or non-urgent scaling up will be prohibited for the next three decision cycles (e.g., 30 minutes). This design stems from practical operational experience: containers typically take 1-3 minutes to go from creation to full readiness. Frequent changes can lead to a vicious cycle of scaling up and then scaling down again, wasting resources and slowing down service recovery. However, when SLO risks are triggered (e.g., p95 latency approaching the threshold), the cooling-off mechanism will be automatically bypassed to prioritize user experience.

[0052] Finally, resource feasibility constraints are the ultimate hurdle for implementation, used to verify whether the target number of replicas is achievable at the underlying infrastructure level. For example, even if the number of container replicas is within a reasonable range, the step size is compliant, and the cooldown has passed, if the service cluster does not have sufficient CPU / memory quota, the scheduler cannot place new Pods due to affinity rules, the image repository pull fails, or the database connection pool has reached its limit, the system will still determine that the resource verification has failed and refuse to make changes, in order to avoid idle running or execution failure triggering alarms.

[0053] By constructing a multi-layered priority constraint pipeline of "scope → step size → cooling → resources," a systematic and progressive filtering of predicted actions is achieved. Each constraint layer has a clear business intent and engineering basis, and takes effect sequentially according to priority. Specifically, resource failure takes precedence over cooling suppression, and cooling suppression takes precedence over step size pruning, ensuring that the most fundamental operational risks are intercepted first. This structured and prioritized constraint processing mechanism transforms container scaling decisions from "whether it is executable" to "how to execute it safely," significantly improving the system's robustness and controllability in complex production environments.

[0054] As an optional implementation, the above-mentioned constraint applies at least one of the following to the predicted container replica count based on the priority of each constraint: replica count range constraint, single change step size constraint, cooldown suppression constraint, and resource feasibility constraint, including:

[0055] If the predicted number of container replicas does not exceed the upper and lower limits of the number of replicas, the system determines whether it is in a cooldown period based on the cooling index. The cooldown period means that after completing a scaling up or down operation, the system prohibits performing another scaling up or down operation within a preset time range.

[0056] If it is determined that there is no cooling-off period, the current service quality is deemed to be in breach of contract based on service quality indicators.

[0057] If no breach of service quality is detected, ignore the cooling suppression constraint and perform a resource feasibility check;

[0058] If the return value of the resource feasibility determination function indicates that all resources and scheduling conditions have passed verification, then the resource feasibility verification is deemed successful.

[0059] First, the aforementioned determination of whether a service is in a cooldown period based on cooling metrics, assuming the predicted number of container replicas does not exceed the upper or lower limits, can, but is not limited to, the system checking whether the current service is in a "cooldown protection period" after confirming that the target number of replicas has not exceeded the safety boundary. A cooldown period is a silent window set by the system after the last scaling up or down operation to avoid frequent changes and to allow buffer time for container creation and warm-up. For example, in a video recommendation service, if the previous cycle just saw a scaling up from 15 to 20 replicas, the system will enter a 30-minute (3 decision cycles) cooldown period during which any scaling down or non-emergency scaling up is prohibited. This is to avoid resource waste and service instability caused by "scaling up and then scaling down, then scaling up again."

[0060] Secondly, a crucial business health awareness capability has been introduced. Service Quality Metrics (SLOs) are typically defined as key performance thresholds perceptible to users, such as an interface p95 response latency not exceeding 300ms. The system monitors in real time whether the current latency is approaching or exceeding this threshold. For example, even during the non-cooling-off period, if the system detects that the p95 latency has reached 280ms (close to the 300ms threshold), it determines that a "default is imminent" but has not yet been triggered; if it has reached or exceeded the threshold, it determines that a "default has already occurred."

[0061] Furthermore, traditional systems prohibit changes during the cooling-off period, regardless of whether the business faces risks. However, the technical solution in this application allows the system to bypass cooling-off restrictions and prioritize resource feasibility verification even during the cooling-off period, provided there is no service quality breach. For example, during peak evening hours, even if the system has recently undergone expansion (still within the cooling-off period), a sudden surge in traffic can cause a continuous rise in latency. Although the SLO threshold has not yet been reached, a deteriorating trend is evident. In this case, the system determines that the risk is "predictable," proactively bypassing the cooling-off restrictions and immediately performing a resource feasibility check. If the cluster still has idle nodes, available images, and sufficient quotas, expansion is immediately executed, preemptively intercepting potential service degradation. This design transforms the system from passive waiting to "proactive prevention," significantly improving the SLO compliance rate.

[0062] If the resource feasibility determination function returns a value indicating that all resources and scheduling conditions have passed verification, then the resource feasibility verification has been passed. This means that the final decision still prioritizes the executability of the underlying infrastructure. Even if business needs are urgent, the system will still refuse to execute if node resources are insufficient, scheduling fails, or dependent services are not ready, to avoid invalid changes.

[0063] Obviously, it is easy to understand that the above process of applying replica number range constraints, single change step size constraints, cooling suppression constraints, and resource feasibility constraints to predict the number of container replicas based on the priority of each constraint is only an example, and is not limited in this embodiment of the application.

[0064] In this embodiment, by constructing an intelligent decision chain that executes sequentially—including cooldown period judgment, SLO status judgment, emergency bypass, and resource verification—a dynamic balance between security throttling and business priority is achieved. This overcomes the drawbacks of the rigid execution of traditional cooldown mechanisms, enabling the system to predict and proactively intervene in potential service quality risks, truly realizing a user-centric elastic scheduling logic.

[0065] As an optional example, the above adjustment of the number of container replicas for each service in the next cycle from the predicted number of container replicas to the target number of container replicas, based on the results of compliance checks, includes at least one of the following:

[0066] If the predicted number of container replicas exceeds the upper and lower limits of the number of replicas, the current change compensation determined based on the difference between the predicted number of container replicas and the current number of container replicas is trimmed based on the maximum change step size to obtain the target number of container replicas.

[0067] If the cooling index indicates that the service quality risk has not been detected, expansion or reduction operations are prohibited.

[0068] In the event that a service quality breach is detected, the cooling suppression constraint is ignored and the predicted number of container replicas is adjusted to the target number of container replicas.

[0069] Expansion or reduction operations are prohibited if the resource feasibility verification is not passed.

[0070] This embodiment further improves the complete decision-making logic based on multi-level constraints, and systematically integrates four core mechanisms: out-of-bounds pruning, cooling suppression, SLO emergency bypass, and resource feasibility. It forms a rigorous, layered, and adaptive container replica number correction closed loop, which is the key technology for achieving "intelligent throttling and safe execution" in the technical solution of this application.

[0071] The above-mentioned approach, when the predicted number of container replicas exceeds the upper and lower limits of the replica count, involves pruning the current change compensation determined by the difference between the predicted and current number of container replicas based on the maximum change step size to obtain the target number of container replicas. This means that when the target prediction model predicts an extreme value (e.g., expanding from 10 replicas to 70), the system will not blindly adopt it but will first prune the replica count range. For example, if the system sets the maximum number of container replicas to 50, then regardless of the model output, the final target value will always be 50. Simultaneously, the system further performs a secondary correction based on the maximum step size (e.g., Δn_max=5): if the current number of replicas is 10, the predicted value is 50, and the difference is 40, but limited to a maximum increase of 5 replicas per step, then the target number of replicas is 15, rather than jumping directly to 50. This process ensures the gradual and controllable nature of the change, avoiding instantaneous scheduling surges due to model misjudgments, thereby reducing node pressure and preventing scheduling queue congestion.

[0072] If the cooling indicators indicate that the system is in a cooling-off period and no service quality risks are detected, scaling up or down operations are prohibited, thus continuing and reinforcing the conservative principle of the cooling-off suppression mechanism. For example, suppose a live streaming service has just completed scaling up and is still within the 30-minute cooling-off period. At this time, the load is stable, the latency is normal, and the error rate is not abnormal. In this case, the system strictly prohibits any changes. Even if the model suggests fine-tuning, it must wait for the cooling-off period to end to protect system stability and prevent invalid changes driven by noise.

[0073] Upon detecting an impending service quality breach, the system ignores cooling-off constraints and adjusts the predicted number of container replicas to the target number. An impending breach can, but is not limited to, situations where the system detects, through trend prediction, that the Service Level Response (SLO) metric is approaching a critical value. For example, a rapid increase in p95 latency from 250ms to 280ms (the threshold is 300ms), while not exceeding the limit, indicates entry into a high-risk zone. In this case, the system bypasses the cooling-off constraint and directly uses the reduced target number of replicas (e.g., from 15 to 20) to perform expansion, preventing potential breaches. In gaming scenarios, this proactive response can prevent player loss due to lag, significantly improving the user experience.

[0074] Prohibiting expansion or contraction operations without passing resource feasibility verification is the final safety valve in the entire decision-making chain. Even if the prediction is reasonable, the step size is compliant, and the SLO is urgent, the system will still forcibly refuse to execute if node resources are insufficient, image pull times out, affinity conflicts occur, or dependent services are not ready, to avoid cascading failures caused by idle or partial failures. For example, if the prediction requires adding 5 replicas, but all nodes in the cluster have CPU usage exceeding 95%, the system will return "insufficient resources" and record the reason for failure for subsequent learner calibration.

[0075] By constructing a complete decision-making chain with clear triggering conditions and processing strategies at each layer, a flexible scheduling paradigm based on "conservatism as the foundation, intelligence as the wings, and security as the bottom" is achieved. This not only solves the two extremes of overly rigid traditional systems but also transforms container scaling into an autonomous operation and maintenance capability with awareness and judgment, improving system robustness and resource efficiency.

[0076] As an optional example, after adjusting the number of container replicas for each service in the next cycle from the predicted number to the target number based on the compliance check results, the method further includes:

[0077] In the current state, perform scaling operations from the current number of container replicas to the target number of container replicas, and the state of each service will transition from the current state to the next state;

[0078] Based on the next state, obtain the response time reward, stability reward, error rate reward, and cost reward for performing scaling operations for each service;

[0079] The target reward after performing the scaling up / down operation is obtained by weighting and summing the response time reward, stability reward, error rate reward, and cost reward.

[0080] Based on the target reward, the scaling decision on the number of container replicas in subsequent decision cycles is adaptively optimized.

[0081] After the system executes scaling operations based on the target number of replicas output by the decision module, the service's running state changes, transitioning from the current state to a new running state. For example, an online video recommendation service that was originally running 10 container instances at a certain time period might see increased load and decide to scale up to 15. Once this is completed, the service's resource usage, response performance, and error patterns will all change, and the system will enter the next state, which forms the basis for subsequent evaluations.

[0082] The system will collect several key metrics from this new operational state to measure the overall benefits of this operation. Among them, the response time reward reflects whether the service response speed has improved. For example, a decrease in the average waiting time for user requests indicates that the expansion has effectively improved the user experience. The stability reward measures the degree of fluctuation in service performance. If the response time is more stable and the fluctuation is reduced, it indicates that the service is more reliable. The error rate reward focuses on the failure of requests. For example, a reduction in interface timeouts or disconnections indicates that the system is more robust. The cost reward comprehensively considers the resources consumed during the expansion process and the time required to start new instances. If the performance is improved after expansion, but a large amount of computing resources are consumed and several minutes of waiting time are required, it will also generate negative feedback.

[0083] The evaluation results from these four dimensions are comprehensively and weighted to arrive at an overall target reward. This reward is not a simple summation, but rather a weighted fusion based on business priorities. For example, in a video platform, user experience (response time) is the most important and has the highest weight; stability is second, error rate is third, and resource cost is a limiting factor. Through this weighted approach, the system can determine whether a scaling up or down operation is a cost-effective and successful operation, or an ineffective change that results in more harm than good.

[0084] The system records the target reward for each operation and feeds it back to the learning module as training data. In subsequent decision-making cycles, the system refers to this historical data to adjust its execution decisions for future cycles. For example, if repeatedly performing pre-peak capacity expansion before evening peak hours yields high rewards, the system will learn to automatically initiate expansion at similar times; conversely, if blindly expanding during low load leads to resource waste, the system will lower the priority of similar actions. This continuous learning based on real business feedback enables the system to automatically adapt to evolving business models, such as changes in traffic patterns during holidays or load shifts caused by the launch of new features.

[0085] As an optional implementation, the above-mentioned adaptive optimization of the scaling decision on the number of container replicas in subsequent decision cycles, based on the target reward, includes at least one of the following:

[0086] Update the structural parameters of the target prediction model;

[0087] Based on historical cumulative rewards, including the target reward, determine the confidence level of the score for each candidate action based on the target prediction model output, and suspend the use of the score output by the target prediction model if the confidence level is less than a preset threshold.

[0088] Using target rewards as the evaluation basis for retrospective records, a corrective action strategy is adopted to control expansion and contraction decisions.

[0089] In this embodiment, a core self-calibration and security protection mechanism for the intelligent decision-making system is constructed. Through model updates, confidence assessments, and corrective actions, a closed-loop intelligent operation and maintenance system integrating learning, verification, and fallback is achieved, ensuring that the system remains stable and controllable while continuously evolving.

[0090] First, updating the structural parameters of the target prediction model means that the system continuously optimizes the AI model used to predict the optimal number of replicas based on the actual effects of each operation. For example, if the system performs a scaling operation during a certain period and subsequently observes a significant decrease in response time, a reduction in error rate, and reasonable resource costs, this feedback will be recorded as a successful experience and used to adjust the model's internal parameters, making it more inclined to recommend similar actions when encountering similar load patterns in the future. Conversely, if a scaling operation results in resource waste but no significant performance improvement, the model will reduce its preference for that type of operation. This update can be done through offline batch training or online incremental learning, allowing the model to dynamically adapt to business development, such as gradually learning to recognize the peak characteristics of sharp increases during holidays from weekday traffic patterns.

[0091] Secondly, the system evaluates the confidence level of the model's current recommendations. For example, the model might suggest increasing the number of replicas from 5 to 8 during a specific time slot (e.g., 2 AM), but historical records show no similar load during this period. The model lacks sufficient samples to support its judgment, and its output recommendation score may be too low, indicating insufficient confidence. When the system determines that this confidence level is below a safe threshold, it immediately pauses predictions relying on that model and adopts a conservative strategy. For example, it might maintain the current number of container replicas or make minor adjustments according to a preset baseline rule. In other words, when uncertain, it prefers not to make changes rather than making blind decisions, avoiding service incidents caused by model misjudgments.

[0092] Finally, the reward mechanism is closely integrated with fault response. After scaling up or down, if the system detects a SLO breach, a sudden increase in error rate, or increased service jitter, it will invoke the backtracking and correction module. At this point, the system does not simply roll back, but rather considers the target reward for this operation: if the reward value is very low, it indicates that the operation itself was ineffective and is likely a misjudgment, requiring a decisive rollback; if the reward is acceptable, but subsequent metrics deteriorate, it may be due to a sudden problem in the deployment environment (such as network jitter), in which case freezing or fine-tuning is preferred over a full rollback. This correction logic based on historical assessments ensures that recovery actions are not a one-size-fits-all approach, but rather an intelligent correction based on evidence and strategy.

[0093] As another optional implementation, the above-mentioned corrective processing strategy for controlling scaling decisions, which uses the target reward as the evaluation basis for backtracking records, includes:

[0094] The target reward is decomposed into multiple component indicators, which correspond to the various constraint indicators in the multi-level constraint conditions.

[0095] Periodic offline analysis is used to execute a correction processing strategy based on the correction trigger threshold.

[0096] In this embodiment, the overall effect of scaling is disassembled into independent evaluation items in multiple dimensions, and each item directly corresponds to an actual operation and maintenance constraint. For example, the improvement in response time corresponds to the service quality guarantee constraint, the increase or decrease in resource consumption corresponds to the cost budget control constraint, the rise or fall in error rate corresponds to the service availability constraint, and the time taken to create a container corresponds to the deployment efficiency constraint. Suppose the overall reward after a scaling-up operation is above average, but after disassembling, it is found that the determined cost component of scaling is seriously negative, while the response time has increased significantly. This indicates that although this operation has improved the experience, the cost is too high, and the system will record this as a high-cost and low-efficiency mode. This decomposition enables both operation and maintenance personnel and the system to clearly identify which link has problems, rather than generally believing that "this operation is not good".

[0097] The system does not adjust immediately for every scaling operation. Instead, it conducts in-depth aggregation analysis of historical data regularly (such as daily or weekly), and only starts to correct the strategy after discovering systematic deviations. For example, in the running records for a consecutive week, it is determined that every Friday evening peak, the model always predicts a 30% scaling-up, but actually only 15% is needed to maintain the SLO, and the resource idle rate is as high as 40% after each scaling-up. This indicates that the model has overfitted in learning the "Friday mode" and has a "conservative tendency". At this time, the system will automatically trigger a correction strategy after offline analysis: reducing the weight of the model's high-frequency prediction on Friday, or adjusting the sensitivity of the cost item in the reward function, or even recalibrating the granularity of time slices. Similarly, if it is found that a certain type of service frequently triggers SLO alarms in the early morning but is not scaled up in time, the system will analyze that the "error rate" component in its historical reward has been relatively low for a long time, indicating that the model is too tolerant of error risks, and thus automatically increases the weight of this component to enhance the tendency of emergency scaling-up.

[0098] The above periodic offline analysis mechanism discovers hidden patterns and systematic deviations through a large amount of historical data, avoiding policy oscillations caused by single anomalies. It does not rely on manual intervention, but allows the system to review itself and summarize experience in a timely manner, thereby achieving the continuous evolution of the strategy.

[0099] In this embodiment, a dual optimization engine of fine-grained evaluation and periodic calibration is constructed. It disassembles the complex overall performance into multiple locatable business constraint items, making each optimization based on evidence; then through regular offline analysis, it identifies systematic problems that are difficult to detect individually but have a huge impact when accumulated over a long time. This mechanism upgrades elastic scaling from an execution feedback type to an intelligent tuning type, not only improving the resource utilization efficiency but also reducing the operation and maintenance costs.

[0100] As an optional implementation method, the above-mentioned correction processing strategy based on a correction trigger threshold through periodic offline analysis includes at least one of the following:

[0101] If any stage of the scaling-up / scaling decision-making process fails during the phased execution within the current verification window, the target container replica count will be restored to the current container replica count.

[0102] If the scaling up / down decision is successfully executed in stages and a service quality breach is determined based on the freeze trigger threshold, the scaling up / down decision will be suspended until the preset release conditions are met.

[0103] If the scaling-up / scaling decision is successfully executed in stages and no service quality breach occurs, compare the error rate change of each service with the error rate threshold. If the error rate change exceeds the error rate threshold, readjust the target number of container replicas based on the current load and error rate change of each service. The error rate change is the difference between the average error rate in the current verification window and the average error rate in the historical verification window before the change in the number of container replicas. The correction trigger threshold includes the freeze trigger threshold and the error rate threshold.

[0104] This embodiment further refines how the system intelligently triggers three types of corrective strategies based on different failure or abnormal scenarios after scaling up or down, forming a set of hierarchical, priority-defined, and scenario-adaptive automatic repair mechanisms, enabling the system to have the intelligent response capability of "precise loss prevention, calm decision-making, and dynamic correction" when facing complex operation and maintenance risks.

[0105] First, during the phased execution of scaling up and down operations, if any stage fails—such as container creation failure, image pull timeout, dependent services not being ready, or health checks failing—the system will immediately terminate the subsequent process and automatically restore the number of replicas to its pre-change state. For example, a backend service plans to scale from 8 container replicas to 12. However, during the traffic access phase, newly started containers may fail to receive requests due to outdated network policies. Upon detecting this failure, the system will immediately trigger a rollback, restoring the number of replicas to 8, thus preventing traffic disruptions or failed user requests caused by the partial rollout of instances. This "failure-based rollback" mechanism is the system's most basic and highest-priority safety fallback.

[0106] Secondly, if all stages of the scaling up / down operation are successfully completed, but service performance continues to deteriorate over a period of time, reaching the preset "freeze trigger threshold," the system will pause all automatic scaling up / down decisions and enter manual confirmation mode. For example, a live streaming service may have just completed a scaling up operation, resulting in a brief improvement in response time, but subsequently, p95 latency continues to rise, and the error rate also slowly increases. The system determines that it has entered a service quality breach risk zone. At this point, the system will not continue to blindly scale up or down, but will freeze all automatic operations, retaining only monitoring and alerts, waiting for operations personnel to intervene and investigate whether the problem is a code issue, dependency failure, or external attack. This freezing mechanism effectively avoids cascading failures caused by "accumulated erroneous decisions," and is a crucial stability barrier, especially when the system has unknown dependencies or environmental drift.

[0107] Furthermore, if scaling up or down is successful and no service quality breach is triggered, the system will continue to evaluate service stability, paying particular attention to the trend of error rate changes. For example, after scaling up a recommendation service, overall latency may decrease, but the error rate may increase from 0.1% to 0.4%. Although this does not exceed the threshold, the change is significant. The system will analyze whether this change is caused by configuration differences introduced by the new replicas, a decrease in cache hit rate, or pressure transmission from downstream interfaces. In this case, the system will recalculate a more reasonable number of replicas based on the current load and the increase in error rate. For example, it may add 1-2 more instances to the original target to share the pressure, rather than simply rolling back. This conservative adjustment is a "fine-tuning" based on actual performance, achieving precise compensation without compromising system stability.

[0108] In this application embodiment, a container elastic scaling system for executing the above-described container copy adjustment method is also provided. This system includes, but is not limited to, a data acquisition and event logging module, a state monitoring and modeling module, a constraint handling and change throttling module, an action generation and decision-making module, an effect evaluation and reward calculation module, a learner module (optional), and an execution orchestration and backtracking correction module. See details for further information. Figure 3 As shown below. Each module will be explained in detail below.

[0109] The data acquisition and event logging module is used to collect the running status and event data of each service in the service cluster, including the number of containers, CPU / memory usage, average response time, error rate, request rate and queuing indicators, and to record scaling events, reasons for execution failures, and the time and resource consumption of service component creation / warm-up, providing a basis for subsequent decision-making and backtracking.

[0110] The status monitoring and modeling module mainly uses time-slicing. Using the following formula (1), a day is divided into multiple time segments:

[0111] (1)

[0112] in, Define the service state space as .

[0113] In this embodiment of the application, the time is recorded. The state vector is ,in, , It can include lifecycle metrics such as container creation / readiness time, as well as observational features such as CPU / memory usage, latency, and error rate; among which, state representation Can be obtained from original observations The state characteristics can be obtained through feature processing or directly constructed from monitoring indicators; both are equivalent state representation methods. This module can also align, normalize, and handle outliers of multi-source observation data, and incorporate lifecycle indicators such as container creation / readiness time into state characteristics to improve the feasibility and stability of decisions.

[0114] The constraint handling and change throttling module can, but is not limited to, be used to process the number of candidate target replicas given by the action generation and decision module. Convert to executable replica count It also outputs the reason why the constraint was hit.

[0115] The triggering conditions are entering the decision-making cycle or detecting a high-risk event (such as a sudden increase in error rate, SLO default risk, release window, or node resource shortage).

[0116] in, Indicates the cooling window ( (Currently cooling).

[0117] In an optional embodiment, the cooling window length is After a non-zero expansion / contraction change is performed, subsequent... Built-in decision cycle When detected At the same time, expansion actions are allowed to bypass cooling limits to reduce the risk of SLO default.

[0118] The constraint handling and change throttling module processing can be implemented, but is not limited to, through the following formulas (2) to (8):

[0119] (2)

[0120] Among them, if It equals 1, allowing expansion and contraction operations even during the cooling period.

[0121] (3)

[0122] (4)

[0123] (5)

[0124] (6)

[0125] (7)

[0126] (8)

[0127] in, This is the number of target container replicas after boundary trimming, the purpose of which is to ensure that the number of candidate target container replicas is within a legal range; This represents the intermediate result after the maximum step size limit for a single change. The purpose is to prevent the number of container replicas from changing too much in a single scaling decision and to avoid drastic fluctuations. This indicates the number of container replicas after constraint processing within the cooling window. The purpose is to prevent excessively frequent scaling operations, but to allow emergency scaling to bypass cooling limits in case of SLO default risk.

[0128] t represents the decision cycle index (usually an integer). This represents the number of candidate target copies output by the action generation module (unconstrained). This represents the current number of replicas at time t. Indicates the final number of executable copies; , These represent the lower and upper bounds of the number of container replicas, respectively. Indicates the maximum step size limit for a single change; Cooling indicator (1 indicates cooling). Indicates the length of the cooling window (in decision cycles); This indicates an SLO emergency order, when When this is the case, it indicates that expansion is permitted during cooling. In the embodiments of this application, For details, please refer to formula (2) above. This represents the resource / scheduling feasibility determination function, with values {0,1}, where 1 indicates feasibility; Reasons represents the set of reasons for constraint hits, returning specific reasons in order of priority (such as resource verification failure, cooling suppression, etc.). , , , These are intermediate results from various stages within the system, among which... Used to determine the upper and lower limits of the number of replicas (to determine whether to scale down to 0 or scale up to exceed the capacity of the service cluster). Indicates the step size limit for a single change. Used to determine the cooling window; f This indicates resource / scheduling feasibility checks (quotas, node resources, scheduling reachability, image and dependency readiness, etc.).

[0129] It should be noted that SLO (Summit Loan) can refer to, but is not limited to, the minimum guarantee when... When =1, ensure that the number of replicas is not less than the current value (i.e., ...). ≥ The priority for determining the cause is as follows: resource verification failure takes precedence over cooling suppression, cooling suppression takes precedence over SLO (Stop-Level Response) guarantee, SLO guarantee takes precedence over step-size limiting, step-size limiting takes precedence over pruning, and pruning takes precedence over passing.

[0130] The action generation and decision-making module can, but is not limited to, define a set of candidate operations for each service. ,in, To increase the step size, The reduction step size can be a fixed value or dynamically adjusted according to the load; and These are the lower and upper bounds for the number of replicas, respectively. This is the maximum step size limit for a single change.

[0131] It should be noted that the action generation and decision-making module generates candidate actions and assigns them priorities / scores. The final actions must be filtered by the constraint processing and change-throttling module before execution. When the learner is unavailable, the confidence level is insufficient, or the constraints do not allow for modification, the system can degenerate into a conservative strategy to output a safe action. The aforementioned confidence level can be determined based on the learner's scoring results for candidate actions. For example, define... (Or use other equivalent rating gap / consistency measures); when it is below the threshold This is considered insufficient confidence.

[0132] The effect evaluation and reward calculation module is used to calculate the comprehensive evaluation and reward corresponding to each action. It can also be understood as being used for online feedback and strategy updates, and to provide evaluation references for backtracking and correction. Specifically, it is implemented through the following formula (9):

[0133] (9)

[0134] in, This is the current state. In order to perform the action, This is the state after the transition.

[0135] (10)

[0136] (11)

[0137] (12)

[0138] This represents the normalized relative error penalty term. The more the service response time deviates from the ideal target, the heavier the penalty; the faster the response, the higher the reward (closer to 0). Small constants (such as) ), to prevent division by zero. This indicates the stability (i.e., jitter or fluctuation) of the response time.

[0139] , Indicates a length of The mean and standard deviation of the observed response time series calculated within the sliding window; , This refers to the corresponding ideal / target reference value or baseline statistic. Indicates the first The total cost of this change, of which... This represents the stage index (stage set) in the phased execution process. (See the execution orchestration module for definition) , and These are the weighting coefficients for resource costs and creation time costs, respectively.

[0140] The learner module, as an optional module, is used to represent a given state. Candidate action set And auxiliary information related to constraints / execution (including but not limited to cooling status, feasibility check results, and recharge records). In cases such as (etc.), output the score for each candidate action. It can optionally output a confidence level to characterize the reliability of the recommendation. And / or action explanation information.

[0141] The learner module is an optional module. It is used when the learner is not deployed, unavailable, or has insufficient confidence (e.g., ...). < When this happens, the action generation and decision-making module degenerates into a conservative strategy to output safe actions. Safe actions include, but are not limited to, "remain unchanged", making small adjustments according to preset rules, or using a baseline strategy to output.

[0142] The learner module can work in conjunction with the constraint processing and change throttling module to mark or filter candidate actions that are not executable or unsafe; when no action is available after filtering, a safe action is output (e.g., remain unchanged) or selected from a set of preset safe actions.

[0143] The implementation of the learner module is not limited; it can be implemented by a rule engine, a statistical model, a machine learning model, or a combination thereof. Its parameter acquisition, updating, and calibration can be done offline, online, or in a hybrid manner, and there are no restrictions on the specific model structure, training objective, or loss function.

[0144] The execution orchestration and backtracking correction module is used to manage the number of executable replicas. The process is implemented in phases, and within the verification window, it determines whether to initiate backtracking and correction based on three types of trigger quantities. Before execution, it is done through... right Perform a capacity availability pre-check; if the pre-check fails, enter freeze / conservative mode and record the reason for failure. The scaling up / down actions are broken down into a set of stages. {Creation, Preheating, Readiness Verification, Access Traffic}.

[0145] It executes in a phase sequence, with the next phase triggered by the completion signal of the previous phase; if any phase fails, the execution failure indicator is set. .

[0146] in, , , This represents the forward window for verification, indicating the subsequent time window used to verify the effect after the current change is implemented; This represents the historical time window used as a comparison benchmark before the current change is implemented. The error rate of service requests within the i-th time window (which can also be understood as a single decision period i) can be calculated, but is not limited to, by the following formula (13):

[0147] (13)

[0148] The jitter / stability within the i-th time window can be calculated, but is not limited to, by the following formulas (14) and (15):

[0149] (14)

[0150] (15)

[0151] in, This represents the coefficient of variation of response time, used to measure the degree of fluctuation in the current service response time relative to the ideal state. A positive value indicates that the fluctuation is more unstable than the ideal state, while a negative value indicates that the fluctuation is less (i.e., more stable) than the ideal state. This is a jitter / stability metric, a stability score that normalizes fluctuation error to a range of 0 to 1. When... hour, Only available historical windows are included; trigger values are calculated based on the actual window length. Normalization. Among them, To verify the window length, , , These are SLO, error rate, and jitter trigger thresholds, respectively. The metrics are aggregated within the verification window, and three types of trigger values (SLO, error rate, jitter / stability) are given, as shown in the following formulas (16) to (18):

[0152] (16)

[0153] , (17)

[0154] , (18)

[0155] in, =1{ }, This is the SLO default risk trigger indicator, a Boolean value of 0 or 1. A value of 0 indicates that the SLO (Standardized Loan Result) has been met and there is no risk of default. If the value is 1, the risk of continued default on the SLO is high, and subsequent expansion / contraction decisions will be frozen and automatic changes will be suspended until manual intervention or the risk is resolved.

[0156] =1{ }, This indicates a trigger value for a deterioration in the error rate, and is a Boolean value of 0 or 1. Its calculation logic is the error rate after the change. Error rate before the change If the error rate difference between the two is less than a threshold, no trigger is given; otherwise, no trigger is given. Setting it to 1 indicates a significant increase in the error rate, triggering error correction.

[0157] If the error rate is significantly higher after the change than before, then Setting it to 1 indicates that the system deems the current sequential expansion and contraction operations potentially excessive and requires a conservative adjustment (usually a remedial expansion). Compared to... The freezing process is based on The corrective measures are more gentle, involving only small adjustments rather than a complete halt.

[0158] =1{ }, This indicates a jitter / stability degradation trigger indicator, which is a Boolean value of 0 or 1. Its calculation logic is based on the changed jitter. Subtract jitter before change Then, the difference in jitter before and after the change is compared with a threshold to determine whether to trigger the change. If the difference in jitter is less than the threshold, the change is not triggered; otherwise, it is triggered. Setting it to 1 indicates a significant increase in jitter, triggering correction.

[0159] This indicates the proportion / frequency of SLO defaults within the changed window. Indicates the error rate after the change. Indicates the error rate before the change; This indicates that after the expansion / shrinkage operation is executed, in the future... Average stability / jitter level over a decision-making cycle This represents the historical average jitter level before the expansion / shrinkage operation was performed (as a baseline).

[0160] The set of backtracking and correction actions is defined as follows: {Through, rollback, freeze, adjust target replica count}, where rollback refers to restoring the container replica count to its previous state. Freezing refers to suspending subsequent scaling decisions until manual intervention or the triggering conditions are removed; adjusting the target replica count refers to recalculating the target replica count based on the current load on error / jitter triggers, specifically calculated using the following formula (19):

[0161] (19)

[0162] Through the above adjustment mechanism, adjustments can be made one step in the direction of improvement (expanding capacity when the error rate or jitter worsens), and corrective actions are generated according to the following priority: execution failure takes precedence over SLO triggering, and SLO triggering takes precedence over error rate triggering and jitter triggering. See the following formula (20) for details:

[0163] (20)

[0164] The above reasons for failure Including but not limited to the following: {Insufficient quota, insufficient node resources, scheduling failure, image pull failure or timeout, dependency not ready, ready probe failure, network anomaly}.

[0165] Simultaneously, the orchestration and retrospective correction modules record and re-implement the data. At least including =( , This is used for subsequent candidate action filtering, throttling parameter adjustment, and learner calibration.

[0166] In a preferred embodiment, it is assumed that the game server containers in the online game system are elastically scaled. Game logic services (such as room services / battle services / combat services) experience peak concurrency during evenings and holidays, and the load exhibits significant bursts and regional variations. Simultaneously, player experience is highly sensitive to latency and jitter; scaling up and down needs to avoid frequent fluctuations and ensure instances become ready quickly. The system configuration is as follows: Time sharding. =10 minutes, maximum number of containers =50, minimum number of containers =2 Service Quality Objectives (SLO) Example: The p95 latency threshold for session establishment or critical interfaces, or expressed as the p95 latency threshold for the game frame synchronization / command processing link; reward coefficient. =0.4, , .

[0167] During operation, the system continuously collects observational features related to the game business, including but not limited to concurrent online users, room creation rate, matching queue length, number of sessions per instance, CPU / memory usage, critical link p95 latency, jitter (coefficient of variation), and error rate / disconnection rate; and records scaling events, instance creation and readiness time, warm-up and access traffic phase time and failure reasons for subsequent backtracking, correction and policy calibration.

[0168] The action generation and decision-making module provides the number of candidate target replicas based on state features. And handled by the constraint processing and throttling module in , The number of executable replicas is selected under constraints such as single-step size limit and cooling window. When an SLO default risk is detected (e.g.) ≥ When expanding capacity, it allows bypassing cooldown limits to reduce the risk of deteriorating the player experience.

[0169] The orchestration and backtracking correction module implements scaling up and down in a phased sequence of "creation → warm-up → readiness verification → traffic access," and aggregates SLO, error rate / disconnection rate, and jitter trigger rate within the verification window: if execution fails, it rolls back; if there is a persistent SLO risk, it freezes or triggers correction adjustments; if only the error rate or jitter significantly worsens, it performs a conservative adjustment to the target replica count. Optionally, the learner module can use historical logs and backfeed records to score and calibrate candidate actions; when the learner is unavailable or has insufficient confidence, the system automatically degenerates into a conservative strategy but can still rely on constraint filtering, throttling, and backtracking correction closed-loop operation for stable operation.

[0170] The various parameters in the above embodiments (such as time slices) The system can automatically adjust or manually configure the container replicas (such as reward weight coefficients). Each module can be implemented through software or deployed on the hardware control unit. The above-mentioned container replica adjustment system can be integrated with container orchestration systems such as Kubernetes and Docker, and has good compatibility.

[0171] Based on the above analysis, it can be seen that the technical solution of this application has at least the following beneficial effects:

[0172] (1) Time-sliced state-space modeling: It can identify the periodic fluctuation characteristics of the service in different time periods;

[0173] (2) Constraint filtering and change throttling: Filter executable actions and limit the frequency and magnitude of changes under constraints such as quota / budget, minimum and maximum replica count and cooling window to reduce scaling jitter;

[0174] (3) Multi-dimensional effect evaluation mechanism: combining performance response, stability (coefficient of variation), error rate and resource and creation time cost to achieve multi-objective trade-off;

[0175] (4) Offline initialization and online adaptation: The learner can be initialized, calibrated and periodically updated using historical logs and backfeed records to adapt to load drift; when the learner is unavailable or the confidence is insufficient, it automatically degenerates into a conservative strategy and works in conjunction with constraint filtering, change throttling and backtracking correction mechanisms to reduce invalid changes and jitter, reduce resource waste and improve stability.

[0176] (5) Phased execution and backtracking correction: Pre-check and schedule the execution of actions, and roll back / freeze / correct when the effect is not up to standard or the execution fails, thereby improving the SLO compliance rate and stability.

[0177] In addition, compared with the traditional scaling method that is triggered by only a single indicator threshold, the container replica adjustment system in this application incorporates creation latency, cooling throttling, and SLO constraints into the same decision chain, and reduces the risk of excessive scaling and cascading degradation through post-execution backtracking evaluation and correction, thereby reducing resource waste while meeting service quality requirements.

[0178] It should be noted that, for the sake of simplicity, the foregoing method embodiments are all described as a series of actions. However, those skilled in the art should understand that this application is not limited to the described order of actions, as some steps may be performed in other orders or simultaneously according to this application. Furthermore, those skilled in the art should also understand that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily essential to this application.

[0179] According to another aspect of the embodiments of this application, as follows is also provided Figure 4An adjustment device for a container copy is shown, the device comprising:

[0180] The first acquisition unit 402 is used to acquire the current status of each service used to carry the target business in the current period, wherein the current status includes the current number of container replicas, resource usage information and response latency;

[0181] The first processing unit 404 is used to obtain the number of prediction container replicas in the next cycle by inputting the current multidimensional state vector corresponding to the current state into the target prediction model.

[0182] The second processing unit 406 is used to perform compliance checks on the predicted number of container replicas based on multi-level constraints. The predicted number of container replicas is used to indicate the actions to be executed in the next cycle. The actions to be executed represent expanding or shrinking the number of container replicas.

[0183] The first adjustment unit 408 is used to adjust the number of container replicas for each service in the next cycle from the predicted number of container replicas to the target number of container replicas based on the compliance check results.

[0184] Optionally, the second processing unit 406 includes:

[0185] The first processing module is used to apply at least one of the following constraints to the predicted number of container replicas based on the priority of each constraint: replica number range constraint, single change step size constraint, cooling suppression constraint, and resource feasibility constraint. The replica number range constraint is used to ensure that the predicted number of container replicas is within the preset upper and lower limits of the replica number. The single change step size constraint is used to limit the single adjustment range from not exceeding the preset step size threshold. The cooling suppression constraint is used to block non-urgent changes when each service is in the protection period. The resource feasibility constraint is used to verify whether the predicted number of container replicas can be actually deployed at the node resource, scheduling strategy, and dependent component levels.

[0186] Optionally, the first processing module mentioned above includes:

[0187] The first processing submodule is used to determine whether it is in a cooling period based on the cooling index if the predicted number of container replicas does not exceed the upper and lower limits of the number of replicas. The cooling period means that after completing a scaling up or down operation, the scaling up or down operation is prohibited from being performed again within a preset time range.

[0188] The second processing submodule is used to determine whether the current service quality is in breach of contract based on service quality indicators if it is determined that the cooling-off period is not in effect.

[0189] The third processing submodule is used to ignore the cooling suppression constraint and perform resource feasibility verification if it is detected that the current service quality has not been breached.

[0190] The fourth processing submodule is used to determine whether the resource feasibility verification is passed if the return value of the resource feasibility determination function indicates that all resources and scheduling conditions have passed the verification.

[0191] Optionally, the first adjustment unit 408 includes:

[0192] The second processing module is configured to perform at least one of the following: if the predicted number of container replicas exceeds the upper and lower limits of the number of replicas, prune the current change compensation determined based on the difference between the predicted number of container replicas and the current number of container replicas, based on the maximum change step size, to obtain the target number of container replicas; if the cooling index indicates that the system is in a cooling period and no service quality risk is detected, prohibit the scaling up or scaling down operation; if a service quality default is detected, ignore the cooling suppression constraint and adjust the predicted number of container replicas to the target number of container replicas; if the resource feasibility verification fails, prohibit the scaling up or scaling down operation.

[0193] Optionally, the above-mentioned device further includes:

[0194] The third processing unit is used to perform scaling operations from the current number of container replicas to the target number of container replicas in the current state, and the state of each service is transferred from the current state to the next state.

[0195] The second acquisition unit is used to acquire the response time reward, stability reward, error rate reward, and cost reward for performing scaling operations for each service based on the next state.

[0196] The fourth processing unit is used to perform a weighted summation of the response time reward, stability reward, error rate reward, and cost reward to obtain the target reward after performing the scaling up and down operation;

[0197] The fifth processing unit is used to adaptively optimize the scaling decision of the number of container replicas in the decision cycle after the next cycle based on the target reward.

[0198] Optionally, the fifth processing unit mentioned above includes:

[0199] The update module is used to update the structural parameters of the target prediction model;

[0200] The second processing module is used to determine the confidence level of the score of each candidate action based on the historical cumulative reward, including the target reward, and to suspend the use of the score output by the target prediction model if the confidence level is less than a preset threshold.

[0201] The control module is used to control the corrective processing strategy for scaling up and down decisions, based on the evaluation criteria of the target reward as the retrospective record.

[0202] Optionally, the above control module includes:

[0203] The decomposition submodule is used to decompose the target reward into multiple component indicators, where the multiple component indicators correspond to the various constraint indicators in the multi-level constraint conditions.

[0204] The fourth processing submodule is used to execute a correction processing strategy based on the correction trigger threshold through periodic offline analysis.

[0205] It should be noted that the embodiments of the container copy adjustment device here can refer to the embodiments of the container copy adjustment method described above, and will not be repeated here.

[0206] According to another aspect of the embodiments of this application, an electronic device for implementing the above-described method for adjusting a container copy is also provided. This electronic device may be... Figure 1 The target terminal or server is shown. This embodiment uses the electronic device as an example to illustrate the concept. Figure 5 As shown, the electronic device includes a memory 502 and a processor 504. The memory 502 stores a computer program, and the processor 504 is configured to execute the steps of any of the above method embodiments through the computer program.

[0207] Optionally, the aforementioned electronic device may be located in at least one of a plurality of network devices of the computer.

[0208] Optionally, the processor described above can be configured to perform the following steps via a computer program:

[0209] S1, obtain the current status of each service used to carry the target business in the current period, where the current status includes the current number of container replicas, resource usage information and response latency;

[0210] S2, by inputting the current multidimensional state vector corresponding to the current state into the target prediction model, the number of prediction container replicas in the next cycle is obtained;

[0211] S3 performs compliance checks on the predicted number of container replicas based on multi-level constraints. The predicted number of container replicas is used to indicate the actions to be executed in the next cycle. The actions to be executed represent expanding or shrinking the number of container replicas.

[0212] S4, based on the compliance check results, adjusts the number of container replicas for each service in the next cycle from the predicted number of container replicas to the target number of container replicas.

[0213] Alternatively, as those skilled in the art will understand, Figure 5 The structure shown is for illustrative purposes only. Figure 5This does not limit the structure of the aforementioned electronic devices or electronic equipment. For example, electronic devices or electronic equipment may also include components that are more... Figure 5 The more or fewer components shown (such as network interfaces, etc.), or having the same Figure 5 The different configurations shown.

[0214] The memory 502 can be used to store software programs and modules, such as the program instructions / modules corresponding to the container replica adjustment method and apparatus in this embodiment. The processor 504 executes various functional applications and data processing by running the software programs and modules stored in the memory 502, thereby implementing the aforementioned container replica adjustment method. The memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memory 502 may further include memory remotely located relative to the processor 504, and these remote memories can be connected to the terminal via a network. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof. Specifically, the memory 502 may be used, but is not limited to, to store the current number of container replicas, the predicted number of container replicas, and compliance check results, etc. As an example, such as... Figure 5 As shown, the memory 502 may include, but is not limited to, the first acquisition unit 402, the first processing unit 404, the second processing unit 406, and the first adjustment unit 408 in the container copy adjustment device. Furthermore, it may include, but is not limited to, other module units in the container copy adjustment device, which will not be elaborated upon in this example.

[0215] Optionally, the transmission device 506 described above is used to receive or send data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 506 includes a Network Interface Controller (NIC), which can be connected to other network devices and a router via a network cable to communicate with the Internet or a local area network. In another example, the transmission device 506 is a Radio Frequency (RF) module, used for wireless communication with the Internet.

[0216] In addition, the aforementioned electronic device also includes: a display 508 for displaying the service screen (such as a game screen) of each service carried after the number of container replicas is adjusted; and a connection bus 510 for connecting the various module components in the aforementioned electronic device.

[0217] In other embodiments, the target terminal or server described above can be a node in a distributed system. This distributed system can be a blockchain system, formed by connecting multiple nodes through network communication. The nodes can form a point-to-point network, and any type of computing device, such as a server or target terminal, can become a node in the blockchain system by joining this point-to-point network.

[0218] According to another aspect of this application, a computer program product or computer program is provided, comprising computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the container copy adjustment method provided in various optional implementations of the above-described server verification processing, wherein the computer program is configured to execute the steps in any of the above-described method embodiments at runtime.

[0219] Optionally, in this embodiment, the computer-readable storage medium described above may be configured to store a computer program for performing the following steps:

[0220] S1, obtain the current status of each service used to carry the target business in the current period, where the current status includes the current number of container replicas, resource usage information and response latency;

[0221] S2, by inputting the current multidimensional state vector corresponding to the current state into the target prediction model, the number of prediction container replicas in the next cycle is obtained;

[0222] S3 performs compliance checks on the predicted number of container replicas based on multi-level constraints. The predicted number of container replicas is used to indicate the actions to be executed in the next cycle. The actions to be executed represent expanding or shrinking the number of container replicas.

[0223] S4, based on the compliance check results, adjusts the number of container replicas for each service in the next cycle from the predicted number of container replicas to the target number of container replicas.

[0224] Optionally, in embodiments of this application, the terms "module" or "unit" refer to a computer program or part of a computer program that has a predetermined function and works with other related parts to achieve a predetermined goal, and can be implemented wholly or partially using software, hardware (such as processing circuitry or memory), or a combination thereof. Similarly, a processor (or multiple processors or memory) can be used to implement one or more modules or units. Furthermore, each module or unit can be part of an overall module or unit that includes the functionality of that module or unit.

[0225] Optionally, in this embodiment, those skilled in the art will understand that all or part of the steps in the various methods of the above embodiments can be implemented by a program instructing the hardware related to the target terminal. The program can be stored in a computer-readable storage medium, which may include: flash drive, read-only memory (ROM), random access memory (RAM), disk or optical disk, etc.

[0226] The sequence numbers of the embodiments in this application are merely for description and do not represent the superiority or inferiority of the embodiments. If the integrated units in the above embodiments are implemented as software functional units and sold or used as independent products, they can be stored in the aforementioned computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or all or part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause one or more computer devices (which may be personal computers, servers, or network devices, etc.) to execute all or part of the steps of the methods in the various embodiments of this application.

[0227] In the above embodiments of this application, the descriptions of each embodiment have their own emphasis. Parts not described in detail in a certain embodiment can be referred to in the relevant descriptions of other embodiments. It should be understood that the disclosed client can be implemented in other ways in the several embodiments provided in this application. The device embodiments described above are merely illustrative; for example, the division of units is only a logical functional division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or integrated into another system, or some features may be ignored or not executed. Furthermore, the displayed or discussed mutual coupling or direct coupling or communication connection may be through some interfaces; the indirect coupling or communication connection of units or modules may be electrical or other forms.

[0228] The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the units can be selected to achieve the purpose of this embodiment according to actual needs. Furthermore, the functional units in the various embodiments of this application can be integrated into one processing unit, or each unit can exist physically separately, or two or more units can be integrated into one unit. The integrated units described above can be implemented in hardware or as software functional units.

[0229] The above are merely preferred embodiments of this application. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of this application, and these improvements and modifications should also be considered within the scope of protection of this application.

Claims

1. A method for adjusting a container copy, characterized in that, include: Obtain the current status of each service used to carry the target business within the current period, wherein the current status includes the current number of container replicas, resource usage information, and response latency; By inputting the current multidimensional state vector corresponding to the current state into the target prediction model, the number of prediction container replicas in the next cycle is obtained; Based on multi-level constraints, a compliance check is performed on the predicted number of container replicas, wherein the predicted number of container replicas is used to indicate the action to be executed in the next cycle, and the action to be executed represents an operation to expand or shrink the number of container replicas. Based on the compliance check results, the number of container replicas for each service in the next cycle will be adjusted from the predicted number of container replicas to the target number of container replicas.

2. The method according to claim 1, characterized in that, The compliance check on the predicted number of container replicas based on multi-level constraints includes: Based on the priority of each constraint, at least one of the following constraints is applied to the predicted number of container replicas: replica number range constraint, single change step size constraint, cooling suppression constraint, and resource feasibility constraint. The replica number range constraint is used to ensure that the predicted number of container replicas is within the preset upper and lower limits of the replica number. The single change step size constraint is used to limit the single adjustment range from not exceeding the preset step size threshold. The cooling suppression constraint is used to block non-urgent changes when each service is in the protection period. The resource feasibility constraint is used to verify whether the predicted number of container replicas can be actually deployed at the node resource, scheduling strategy and dependent component levels.

3. The method according to claim 2, characterized in that, The step of applying at least one of the following constraints to the predicted number of container replicas based on the priority of each constraint: replica number range constraint, single change step size constraint, cooling suppression constraint, and resource feasibility constraint, including: If the predicted number of container replicas does not exceed the upper and lower limits of the number of replicas, it is determined whether it is in a cooling period based on the cooling index, wherein the cooling period means that after completing a scaling up or down operation, the scaling up or down operation is prohibited from being performed again within a preset time range; If it is determined that the current service quality is not in the cooling-off period, a breach of contract is determined based on service quality indicators. If the current service quality is not found to be in breach, the cooling suppression constraint is ignored, and a resource feasibility check is performed. If the return value of the resource feasibility determination function indicates that all resources and scheduling conditions have passed verification, then the resource feasibility verification is deemed successful.

4. The method according to claim 1, characterized in that, The adjustment of the number of container replicas for each service in the next cycle from the predicted number of container replicas to the target number of container replicas, based on the compliance check results, includes at least one of the following: If the predicted number of container replicas exceeds the upper and lower limits of the number of replicas, the current change compensation determined based on the difference between the predicted number of container replicas and the current number of container replicas is pruned based on the maximum change step size to obtain the target number of container replicas. If the cooling index indicates that the service quality risk has not been detected, expansion or reduction operations are prohibited. If an impending service quality breach is detected, the cooling suppression constraint is ignored, and the predicted number of container replicas is adjusted to the target number of container replicas. Expansion or reduction operations are prohibited if the resource feasibility verification is not passed.

5. The method according to claim 1, characterized in that, After adjusting the number of container replicas for each service in the next cycle from the predicted number of container replicas to the target number of container replicas based on the compliance check results, the method further includes: In the current state, a scaling operation is performed from the current number of container replicas to the target number of container replicas, and the state of each service transitions from the current state to the next state; Based on the next state, obtain the response time reward, stability reward, error rate reward, and cost reward for performing the scaling up and down operation for each service. The target reward after performing the scaling up / down operation is obtained by weighting and summing the response time reward, the stability reward, the error rate reward, and the cost reward. Based on the target reward, adaptive optimization is performed on the scaling up or down decision of the number of container replicas in the decision cycle after the next cycle.

6. The method according to claim 5, characterized in that, The adaptive optimization of the scaling decision on the number of container replicas in subsequent decision cycles based on the target reward includes at least one of the following: Update the structural parameters of the target prediction model; Based on historical cumulative rewards, including the target reward, determine the confidence level of the score for each candidate action output by the target prediction model, and if the confidence level is less than a preset threshold, suspend the use of the score output by the target prediction model; The target reward is used as the evaluation basis for backtracking records to control the corrective processing strategy for the scaling up and down decision.

7. The method according to claim 6, characterized in that, The corrective processing strategy for controlling the scaling-up / scaling decision, which uses the target reward as the evaluation basis for backtracking records, includes: The target reward is decomposed into multiple component indicators, wherein the multiple component indicators correspond to the various constraint indicators in the multi-level constraint conditions; Periodic offline analysis is used to execute a correction processing strategy based on the correction trigger threshold.

8. An adjustment device for a container copy, characterized in that, include: The first acquisition unit is used to acquire the current status of each service used to carry the target business in the current period, wherein the current status includes the current number of container replicas, resource usage information and response latency; The first processing unit is used to obtain the number of prediction container replicas in the next cycle by inputting the current multidimensional state vector corresponding to the current state into the target prediction model. The second processing unit is used to perform compliance checks on the predicted number of container replicas based on multi-level constraints, wherein the predicted number of container replicas is used to indicate the action to be executed in the next cycle, and the action to be executed represents an operation to expand or shrink the number of container replicas. The first adjustment unit is used to adjust the number of container replicas for each service in the next cycle from the predicted number of container replicas to the target number of container replicas, based on the compliance check results.

9. A computer-readable storage medium, characterized in that, The computer-readable storage medium includes a stored program, wherein the program can be executed by a terminal device or computer at runtime as described in any one of claims 1 to 7.

10. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and the processor is configured to perform the method as described in any one of claims 1 to 7 via the computer program.