Active defense method for adversarial denial of service attack in micro-service scenario

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
By introducing zero-sum game modeling and MDDQN into the microservice architecture, combined with multi-resource thresholds and lightweight incremental training, the problems of high detection difficulty and low resource utilization of RDoS attacks in the microservice architecture are solved. This enables efficient and adaptive defense strategy optimization and collaborative scaling, thereby improving the security and service quality of the system.

CN121333708BActive Publication Date: 2026-06-26SOUTHEAST UNIV

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Patents(China)
Current Assignee / Owner: SOUTHEAST UNIV
Filing Date: 2025-10-24
Publication Date: 2026-06-26

Application Information

Patent Timeline

24 Oct 2025

Application

26 Jun 2026

Publication

CN121333708B

IPC: H04L9/40

AI Tagging

Technology Topics

Quality of service Attack

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

In a microservice architecture, resource exhaustion denial-of-service attacks (RDoS) are characterized by highly similar traffic and business operations, strong concealment, and difficulty in detection. Existing defense methods have limited applicability in the microservice application layer, making it difficult to balance real-time performance and resource utilization. Furthermore, they are prone to misjudgment and failure when attack strategies change dynamically.

Method used

By introducing zero-sum game modeling and multidimensional discrete deep Q-network (MDDQN), we optimize the defense strategy through adversarial training, achieve multi-node collaborative scaling by combining multiple resource thresholds, use a lightweight incremental training mechanism for iterative optimization of the strategy, and build a joint decision model for precise scaling deployment.

Benefits of technology

It achieves efficient adaptive defense in dynamic environments, improves the security and service quality of microservice systems, reduces the risk of service level SLO defaults and resource jitter, and improves defense response efficiency and resource utilization efficiency.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN121333708B_ABST

Patent Text Reader

Abstract

The application discloses an active defense method against adversarial denial of service attacks in a microservice scenario. First, multi-source index collection probes are deployed in the microservice system and combined with centralized aggregation processing. Then, the system attack and defense process is modeled as a zero-sum game problem of double reinforcement learning agents. The defense strategy is optimized in the alternating update using the adversarial training mechanism, allowing the defender to gradually converge to a robust optimal strategy. On this basis, a multi-dimensional discrete deep Q network is introduced. Through shared feature extraction and multi-head independent output, efficient generation of multi-dimensional expansion and contraction defense decisions is achieved. A loss balancing mechanism is used to accelerate convergence. Finally, a closed-loop optimization mechanism is combined to trigger lightweight incremental training when new attack patterns are detected or defense effectiveness decreases, dynamically updating part of the network weights to maintain the adaptability of the strategy. This scheme can achieve intelligent, automated and efficient defense of microservice systems in adversarial attack environments, significantly improving system security and service quality.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to a proactive defense method against adversarial denial-of-service attacks in a microservices scenario, belonging to the field of proactive defense technology. Background Technology

[0002] With the rapid development of cloud computing and containerization technologies, microservice architecture has gradually become the mainstream design pattern for modern distributed systems. By breaking down complex monolithic applications into multiple functionally independent, autonomously deployable, and scalable service units, it effectively improves the flexibility and scalability of the system. However, microservice architecture also brings complex service call chains and high dependencies, significantly increasing the difficulty of resource scheduling and management. In multi-tenant, multi-node, and multi-link collaborative environments, microservice systems often expose a larger attack surface and potential risks, placing higher demands on system stability and security.

[0003] Traditional Distributed Denial-of-Service (DDoS) attacks often rely on large-scale traffic flooding to directly impact network and host resources. However, in microservice scenarios, attackers often exploit the complex dependencies and bottleneck characteristics between services to launch resource exhaustion denial-of-service (RDoS) attacks with low traffic and high resource consumption. These attacks can not only propagate through service call chains to create cascading blocking effects, but also induce frequent system scaling up and down by dynamically switching attack targets, resulting in resource waste and performance fluctuations. Compared to traditional DDoS, RDoS is characterized by highly similar traffic and business logic, strong concealment, and high detection difficulty, making it a major security challenge seriously threatening the availability and cost-effectiveness of microservice systems.

[0004] In recent years, proactive defense has gradually become an important direction for dealing with application-layer DDoS attacks. Existing research has proposed moving target defense (MTD), deception defense, mimicry defense, and container migration-based mitigation schemes. These methods increase the cost to attackers and reduce the success rate of attacks through dynamic and randomized mechanisms. However, most of these methods rely on network layer or low-level resource scheduling, limiting their applicability to microservice application layers. Furthermore, some methods suffer from frequent actions and excessive computational overhead, making it difficult to balance real-time performance and resource utilization. In addition, existing defense methods relying on machine learning or deep learning are often highly sensitive to attack traffic characteristics, lack generalization capabilities, and are prone to misjudgment and failure when facing dynamically changing attack strategies.

[0005] To address the aforementioned issues, this patent proposes a proactive defense method against adversarial denial-of-service attacks in microservice scenarios. This method introduces a zero-sum game model to model the attack and defense process, utilizing adversarial training to make the defense strategy robust against dynamic attackers. Simultaneously, a multidimensional discrete deep Q-network (MDDQN) is employed to achieve multi-node collaborative scaling, and by jointly considering multi-dimensional resource thresholds such as CPU, memory, and latency, collaborative defense scheduling across service links is completed. Compared to traditional methods relying on static detection or heuristic strategies, this patented solution can achieve adaptive iterative optimization of the defense strategy in dynamic environments, effectively improving the overall protection capabilities of microservice systems in terms of resource utilization, service quality, and security. Summary of the Invention

[0006] This invention proposes a proactive defense method against adversarial denial-of-service attacks in a microservices scenario. First, by deploying multi-source indicator collection probes within the microservice system and combining them with centralized aggregation processing, automated collection of multi-dimensional indicators such as runtime resources, latency, and packet loss rate, as well as system state matrix modeling, are achieved. Then, the system's attack and defense process is modeled as a zero-sum game problem involving two reinforcement learning agents. An adversarial training mechanism is used to optimize the defense strategy through alternating updates, gradually leading the defenders to a robust optimal strategy. Building upon this, a multi-dimensional discrete deep Q-network is introduced. Through shared feature extraction and multi-head independent output, efficient generation of multi-dimensional scaling defense decisions is achieved, and a loss balancing mechanism accelerates convergence. At the decision execution level, a joint judgment model is constructed by integrating multiple resource thresholds, enabling precise scaling deployment and configuration distribution for attacked services, thereby ensuring service continuity and stability. Finally, a closed-loop optimization mechanism is incorporated. Lightweight incremental training is triggered when new attack patterns are detected or the defense effectiveness declines, dynamically updating some network weights to maintain the strategy's adaptability. By adopting the technical solution of this invention, microservice systems can achieve intelligent, automated, and efficient defense in adversarial attack environments, significantly improving system security and service quality.

[0007] This invention adopts the following technical solution: a proactive defense method against adversarial denial-of-service attacks in a microservice scenario, the method comprising the following steps:

[0008] Step (1): Resource monitoring and environmental modeling,

[0009] Step (2): Defense algorithm based on MDDQN,

[0010] Step (3): Adversarial training based on zero-sum game.

[0011] Step (4): Scaling up and down deployment based on multiple resource thresholds.

[0012] Step (5): Strategy iteration and optimization.

[0013] The specific details of the steps are as follows:

[0014] Step (1): Resource monitoring and environmental modeling, specifically as follows: In the microservice system, deploy an observation metric collection application for each node and service replica, where the number of services is... For example, native metric collection interfaces such as the Kubernetes Metrics API and Istio, or custom metric collection probes, can be used to report observed metrics of the deployed application, including information such as resource usage, latency, and packet loss. Then, a centralized processing program periodically collects and aggregates all replicas of all services to calculate the observed metrics for that service, such as total service resource usage, average latency, and request failure rate, and normalizes them into resource vector form. The resource vector dimension is Finally, all service node metric vectors are modeled as a state matrix representation of the microservice system at the current moment. ,

[0015] Step (2): The defense algorithm based on MDDQN is as follows. This method encodes the system state matrix as input through a shared feature extraction layer, and then sets an independent output head for each action dimension. Each output head corresponds to one dimension of the action space, so as to output a multi-dimensional action vector for decision. The network output form of MDDQN can be represented as follows. ,in, Indicates the state Next, for the first Take action on each resource dimension The expected value at that time; in terms of strategy selection, adopting A greedy exploration strategy, when the exploration rate is high, independently selects a random action for each dimension; when the exploration rate is low, it independently selects the current optimal action for each dimension. In loss propagation, each output head calculates the Q-value and obtains the loss in each dimension. Then, backpropagation is performed using the mean of each dimension as the total loss. This enables rapid convergence of multidimensional scaling decisions.

[0016] Step (3): Zero-sum game-based adversarial training, specifically as follows: First, the attack and defense adversarial process of the microservice system is modeled as a zero-sum game problem. The attacker, as an adversarial agent, dynamically selects the attack target and attack intensity according to the environmental state and defense strategy, and applies perturbation to the system. The defender learns and optimizes the defense strategy to ensure system security and service quality. To ensure fairness, both the attacker and defender use the MDDQN algorithm for driving. The state matrices of both sides are... The action spaces of the attacker and defender are the same, obtained from the modeling in step (1); the action spaces of the attacker and defender are different, and the attacker updates the attack strategy through adversarial training. The attacker's action space can be represented as The defender effectively resists attacks by deciding on multidimensional scaling thresholds, and its action space can be represented as follows: Finally, a reward function for the defender is constructed based on system security metrics and quality of service. The attacker's reward function is the negative of the defender's reward function. This achieves a zero-sum game; finally, both sides adopt an alternating update strategy, with the attacker iterating continuously first. The defender iterates continuously, conducting adversarial training to update its attack strategy. The training alternates between cycles, constantly updating defensive strategies to adapt to attacks, for a total of [number missing] cycles. This continues until both sides' strategies converge. At this point, the attacker has reached their optimal attack strategy, and the defender can achieve the best defensive effect against the adversarial attack.

[0017] Step (4): Scaling up and down deployment based on multiple resource thresholds, as follows. This method not only considers a single CPU or memory load, but also integrates multiple resource thresholds to build a joint judgment mechanism. First, the defense action vector obtained from the decision in step (3) is used. As input to this module, each resource dimension is a threshold vector. , respectively representing the bottleneck threshold and target threshold The former is used to determine the services that need to be scaled up or down, while the latter represents the resource threshold that the service is expected to reach after scaling up or down. Then, based on the joint decision of multiple resource thresholds, the scaling up or down strategy is determined. For each service, the scaling up or down action is first determined based on the relationship between real-time observed indicators and the target threshold. If all real-time observed metrics across all service dimensions are below the bottleneck threshold, no action will be taken. If the real-time observed index is less than or equal to the target threshold, then the capacity will be expanded. expansion factor Otherwise, shrinkage Reduction factor Next, the results calculated from multiple resource dimensions are aggregated to obtain the service scaling actions. That is, if any resource dimension needs to be scaled up or down, the scaling action is executed. The maximum scaling factor across all resource dimensions is used as the service scaling factor, and the scaling factor is... The reduction factor is Finally, based on the new number of replicas, new Deployment deployment configurations are generated for all services that need to be scaled up or down, and then distributed to the microservice control node for execution. This enables coordinated scaling up and down and rapid response for all attacked nodes in the microservice system.

[0018] Step (5): Strategy iteration and optimization, specifically as follows: After deploying the defenders obtained in steps (2) and (3) in the actual environment, the defense strategy needs to be fine-tuned and updated based on the actual defense effect of the system. That is, when a new attack mode appears in the system, the defense effect continues to decline, the service topology is adjusted, etc., the incremental training process is triggered. Lightweight alternating training is performed based on the original network parameters. The latest defense strategy is obtained by updating some network weights. This closed-loop optimization mechanism can avoid strategy oscillation and achieve adaptive defense capability.

[0019] In step 1, resource monitoring and environmental modeling, this paper proposes a resource monitoring framework that constructs multi-node proactive reporting and centralized indicator aggregation, and applies it to environmental status modeling, as detailed below:

[0020] Step (1.1) Metric Collection and Reporting: Deploy the observation metric collection application on each node and its service replicas of the microservice system. The set of microservice system nodes can be represented as follows: , This refers to the number of nodes; the data collection application is implemented through the Kubernetes Metrics API, the Istio native metrics interface, or a custom probe, and is used to collect operational metrics including resource usage, latency, and packet loss rate.

[0021] Step (1.2) Metric Collection and Aggregation: A centralized processing program is used to collect operational metrics from all nodes and service replicas at preset intervals. Data from multiple replicas of the same service is then aggregated and calculated. This aggregation process can be broadly categorized into three methods, as detailed below. To serve the number of copies, Indicates the service number The observation metrics for each replica are first aggregated using an additive method for total resource usage, i.e., for a specific service resource usage metric. For average latency, the average value is aggregated, i.e., the average latency index of a certain service. For request failure rate, a weighted average aggregation is used, i.e., the failure rate metric for a specific service request. The number of requests for each replica will be included. As weights, the observation indicators from each dimension are finally aggregated to obtain the service observation indicator vector. The total number of observed indicators is ,

[0022] Step (1.3) Normalize to resource vector: normalize the aggregated service observation index vector Normalization is performed using the following formula: , These represent the maximum and minimum values of the observed index, respectively, and the resulting observation vector for the service is... ,

[0023] Step (1.4) State Matrix Modeling: Combine the resource vectors of all service nodes to form the system state matrix. This describes the overall operational status of the microservice system at the current moment.

[0024] Step 2: Defense Algorithm Based on MDDQN. This paper proposes a multi-independent output head MDDQN algorithm to achieve efficient multi-dimensional action decision-making and policy learning convergence, as detailed below:

[0025] Step (2.1) State-space encoding: MDDQN consists of a shared layer and a multidimensional output head, which encodes the system state matrix obtained in step (1). Input a shared feature extraction layer to extract a high-dimensional state representation.

[0026] Step (2.2) Independent Output by Multiple Heads: The system state matrix is encoded as input through a shared feature extraction layer. Then, an independent output head is set for each action dimension, with each output head corresponding to one dimension of the action space. The decision output is a multi-dimensional action vector. The network output form of MDDQN is as follows: ,in, This indicates the state action to be taken under the k-th resource dimension. Expected value

[0027] Step (2.3) Multidimensional Action Output: In terms of strategy selection, the following approach is adopted. A greedy exploration strategy selects a random action independently for each dimension when the exploration rate is high, and selects the current optimal action independently for each dimension when the exploration rate is low. ,Right now ,

[0028] Step (2.4) Multidimensional loss fusion: Calculate the Q-value for each output head and obtain the loss for each dimension. , Indicates the first Dimensional loss, This represents the target value in the k-th dimension, used to guide model updates. The losses from each dimension are summed and averaged to form the mean loss for backpropagation. The formula for calculating the mean loss is: This enables rapid convergence of multidimensional scaling decisions.

[0029] Step 3: Adversarial training based on zero-sum game theory. This paper constructs an adversarial training framework for dual reinforcement learning agents based on the zero-sum game concept, and obtains the optimal defense model through alternating training, as detailed below:

[0030] Step (3.1) Modeling and selection of attacking and defending parties: The attack and defense interaction process of the microservice system is modeled as a zero-sum game problem, with inputs from the same state space. In this scenario, both the attacker and the defender employ the MDDQN algorithm. The attacker continuously refines their adversarial attack methods based on the environment and defense strategies, while the defender iteratively optimizes their defense strategies.

[0031] Step (3.2) Action Space Construction: Construct action spaces for both the attacker and the defender, whereby the attacker's action space... The attack path and resource attack dimension have their own value space, while the defender's action space is... For scaling services and the value space of scaling factors,

[0032] Step (3.3) Zero-sum game-based reward function: Construct the defender's reward function with the goal of system security and service quality, and define the attacker's reward function as the inverse of the defender's reward function. This means that a positive gain for one side will inevitably lead to a negative gain for the other side, satisfying the zero-sum game constraint.

[0033] Step (3.4) Attack and defense confrontation training: Adopting an alternating update strategy, the attacker first iterates continuously. The defender iterates continuously, conducting adversarial training to update its attack strategy. The training alternates between cycles, constantly updating defensive strategies to adapt to attacks, for a total of [number missing] cycles. This continues until both sides' strategies converge. At this point, the attacker has reached their optimal attack strategy, and the defender can achieve the best defensive effect against the adversarial attack.

[0034] Step 4: Scaling up and down deployment based on multiple resource thresholds. This paper designs a scaling up and down generation strategy based on multi-resource collaboration to quickly mitigate attack traffic from multiple nodes, as detailed below:

[0035] Step (4.1) Multidimensional action vector input: First, the defense action vector obtained from the decision in step (3) is processed. As input to this module, each resource dimension is a threshold vector. , respectively representing the bottleneck threshold and target threshold The former is used to determine whether a service needs to be scaled up or down, while the latter represents the resource threshold that the service is expected to reach after scaling up or down.

[0036] Step (4.2) Scaling-up and scaling-down strategies for each resource dimension: For each resource dimension, analyze the bottleneck threshold and target threshold, and first determine the scaling-up and scaling-down actions based on the relationship between real-time observed indicators and the target threshold. If all real-time observed metrics across all service dimensions are below the bottleneck threshold, no action will be taken. If the real-time observed index is less than or equal to the target threshold, then the capacity will be expanded. expansion factor Otherwise, shrinkage Reduction factor ,

[0037] Step (4.3) Service Dimension Scaling / Scaling Strategy: At the service level, aggregate scaling / scaling decisions for each resource dimension. That is, if any resource dimension needs scaling / scaling, then a scaling / scaling action is executed. The maximum scaling / scaling multiple of all resource dimensions is used as the service scaling / scaling multiple. The reduction factor is ,

[0038] Step (4.4) Service Deployment Configuration Update and Distribution: Based on the new number of replicas, generate new Deployment deployment configurations for all services that need to be scaled up or down, and then distribute them to the microservice control node for execution. This enables coordinated scaling up and down and rapid response for all attacked nodes in the microservice system.

[0039] Step 5: Strategy iteration and optimization. This paper applies a lightweight incremental training mechanism to update the defense model in a dynamic environment, as detailed below:

[0040] Step (5.1) Defense Effectiveness and System Status Monitoring: Deploy the defense strategies obtained through steps (2) and (3) in the actual operating environment, continuously monitor system security and service quality indicators, and trigger the strategy optimization process when new attack patterns are detected, defense effectiveness declines, or service topology is adjusted.

[0041] Step (5.2) Lightweight training for model update: Incremental lightweight alternating training is performed based on the original model parameters, updating only some network weights to obtain the latest defense strategy, thereby achieving closed-loop optimization, avoiding policy oscillation and improving the system's adaptive defense capability.

[0042] An electronic device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the proactive defense method against adversarial denial-of-service attacks in a microservices scenario.

[0043] A computer-readable storage medium storing computer instructions thereon, which, when executed by a processor, implement the proactive defense method against adversarial denial-of-service attacks in a microservices scenario.

[0044] Compared with the prior art, the present invention has the following advantages:

[0045] (1) Lightweight and scalable distributed data reporting and modeling: The resource monitoring mechanism proposed in this invention adopts a distributed reporting mode, which can make full use of the existing interfaces in the microservice system, such as the Kubernetes Metrics API and the built-in metrics of Istio. It can also extend the collection scope through custom probes and support the reporting of different types of resource metrics. This approach is lightweight, flexible and scalable. Through centralized aggregation calculation, various service-level metrics (such as total resource usage, average latency, failure rate, etc.) can be flexibly combined and generated, and then normalized into resource vectors to form a system state matrix, providing a unified and efficient state representation for subsequent defense decisions.

[0046] (2) Multidimensional Defense Decision-Making and Fast Convergence Based on MDDQN: Unlike traditional DQN, which requires joint decision-making in the overall action space, this invention splits the action space into multiple dimensions and makes independent decisions. This mechanism significantly reduces the combinatorial complexity brought about by the dimensionality of the action space, allowing each dimension of the action to be optimized independently. Combined with the ε-greedy policy and multidimensional loss mean fusion, this method can achieve faster policy convergence while ensuring exploratory nature, thus maintaining good training efficiency and defense performance in high-dimensional action environments.

[0047] (3) Robust defense strategy under adversarial training: This invention models the attack and defense process as a zero-sum game problem, and introduces alternating optimization between the attacker and defender in the defense strategy training. Under this framework, even if the attacker adopts the optimal attack strategy, the defender can still converge to the optimal response strategy through adversarial training. By adopting the dual reinforcement learning agent adversarial training method, the attacker and defender conduct alternating training and eventually converge to their respective best strategies. The resulting defender can exhibit stronger generalization and stability in a threat environment with adversarial, dynamic and complex characteristics. Compared with strategies optimized only under static or expected conditions, the adversarial training of this invention can effectively suppress strategy vulnerability and improve the robust protection effect when facing adaptive attackers.

[0048] (4) Multi-resource threshold-driven collaborative scaling: This invention proposes a multi-resource joint threshold mechanism, which simultaneously determines scaling requirements across multiple resource dimensions such as CPU and memory, and adaptively calculates the scaling factor for each service (based on the ratio between the bottleneck threshold and the target threshold). Then, multi-dimensional results are aggregated and batch-deployed at the service level, supporting collaborative scaling for multiple services. Compared to the traditional "single-node scaling" approach, this invention enables global collaborative scheduling when attack flows spread and link-level bottlenecks form, rapidly improving the system's ability to absorb sudden malicious traffic, reducing service-level SLO defaults and resource jitter risks, thereby achieving more efficient defense response and resource utilization. Attached Figure Description

[0049] Figure 1 A framework diagram for a proactive defense method against adversarial denial-of-service attacks in a microservices scenario;

[0050] Figure 2 This is a system architecture diagram for a proactive defense method against adversarial denial-of-service attacks in a microservices scenario. Detailed Implementation

[0051] Based on the steps described above, this section implements some examples in conjunction with the technical solution steps described in the instruction manual.

[0052] Example 1:

[0053] A proactive defense method against resource exhaustion-type adversarial denial-of-service attacks in a microservices scenario.

[0054] Step (1): Resource monitoring and environmental modeling,

[0055] Step (2): Defense algorithm based on MDDQN,

[0056] Step (3): Adversarial training based on zero-sum game.

[0057] Step (4): Scaling up and down deployment based on multiple resource thresholds.

[0058] Step (5): Strategy iteration and optimization.

[0059] The steps (1) mentioned above, resource monitoring and environmental modeling, are as follows: First, the experimental environment setup is introduced. This paper... A microservice auto-scaling system based on muBench was built on a virtual machine. A social network application was also built, referencing the microservice topology of DeathStarBench socialNetwork. Composed of service nodes Meanwhile, service dependencies form a side set. The process of request traffic propagating along the link in a microservice cluster can be modeled as a traffic transfer matrix. ,in Therefore, this microservice cluster can be described as a directed acyclic graph, denoted as . The microservice system is deployed using Kubernetes Helm, so it can utilize the Kubernetes API to report node and service resource information, which can be represented as a resource vector. Since we are considering attack defense under conditions of sufficient resources, the nodes have sufficient resources to support scaling up and down. Therefore, we only collect service resource vectors. The resource consumption of microservice nodes is related to the amount of traffic, and different services consume different resources. The real-time resource vector is represented as Additionally, the number of copies Latency can also be obtained through the Kubernetes Metrics API; additionally, a custom metric collection probe can be deployed within the service and aggregated using Prometheus to obtain latency data. Transmission rate Request failure rate By aggregating metrics, services can be obtained. observation vector representation After normalization, we get The number of observation indicators is Finally, through periodic data collection and processing, a real-time microservice state matrix representation is obtained. , Indicates service At this time, the observation indicators ,

[0060] Step (2), the defense algorithm based on MDDQN, is as follows: The MDDQN method encodes the system state matrix as input through a shared feature extraction layer, and then sets an independent output head for each action dimension. Each output head corresponds to one dimension of the action space, so as to output a multi-dimensional action vector for decision. The network output form of MDDQN can be represented as follows. ,in, Indicates the state Next, for the first Take action on each resource dimension State-action values; in strategy selection, adopt... A greedy exploration strategy, when the exploration rate is high, independently selects a random action for each dimension; when the exploration rate is low, it independently selects the current optimal action for each dimension. In loss propagation, each output head calculates the Q-value and obtains the loss in each dimension. Then, backpropagation is performed using the mean of each dimension as the total loss. This enables rapid convergence of multidimensional scaling decisions.

[0061] Step (3), based on zero-sum game adversarial training, is as follows: First, the attack and defense adversarial process of the microservice system is modeled as a zero-sum game problem. The attacker, as an adversarial agent, dynamically selects the attack target and attack intensity according to the environmental state and defense strategy, and applies perturbation to the system. The defender learns and optimizes the defense strategy to ensure system security and service quality. To ensure fairness, both the attacker and defender use the MDDQN algorithm for driving. The state matrices of both sides are... The same, obtained from the data collection and modeling in step (1); for attackers, they can dynamically change the attack path and attack intensity. RDoS attackers launch attacks by selecting the service interfaces exposed by the microservice system, therefore the attack path... , This represents the number of selectable attack paths, and for each attack path, a certain intensity of attack traffic can be launched. Each attacker has a maximum limit on the total attack resources. ,Right now Therefore, the attacker's action space can be represented as , Indicates path The attack traffic is Defenders effectively resist attacks by deciding on multi-dimensional scaling thresholds, i.e., for resource vectors... For each dimension, determine a scaling threshold vector. The threshold vector dimension is The range of values for each dimension is: Its action space can be represented as Finally, a reward function for the defender is constructed based on system security metrics and quality of service. It consists of multiple weighted components, namely, security reward, number of dangerous services, link failure rate, number of idle replicas, and action execution penalty. The attacker's reward function is the inverse of the defender's reward function. This achieves a zero-sum game; finally, both sides adopt an alternating update strategy, with the attacker iterating continuously first. The defender iterates continuously, conducting adversarial training to update its attack strategy. The training alternates between cycles, constantly updating defensive strategies to adapt to attacks, for a total of [number missing] cycles. This continues until both sides' strategies converge. At this point, the attacker has reached their optimal attack strategy, and the defender can achieve the best defensive effect against the adversarial attack.

[0062] Step (4), based on scaling up and down deployment with multiple resource thresholds, is as follows: This method not only considers a single CPU or memory load, but also integrates multi-dimensional resource thresholds to construct a joint judgment mechanism. First, the defense action vector obtained from the decision in step (3) is used. As input to this module, each resource dimension is a threshold vector. , respectively representing the bottleneck threshold and target threshold The former is used to determine the services that need to be scaled up or down, while the latter represents the resource threshold that the service is expected to reach after scaling up or down. Then, based on the joint decision of multiple resource thresholds, the scaling up or down strategy is determined. For each service, the scaling up or down action is first determined based on the relationship between real-time observed indicators and the target threshold. If all real-time observed metrics across all service dimensions are below the bottleneck threshold, no action will be taken. If the real-time observed index is less than or equal to the target threshold, then the capacity will be expanded. expansion factor Otherwise, shrinkage Reduction factor Next, the results calculated from multiple resource dimensions are aggregated to obtain the service scaling actions. That is, if any resource dimension needs to be scaled up or down, the scaling action is executed. The maximum scaling factor across all resource dimensions is used as the service scaling factor, and the scaling factor is... The reduction factor is Finally, based on the new number of replicas, new Deployment deployment configurations are generated for all services that need to be scaled up or down, and then distributed to the microservice control node for execution. This enables coordinated scaling up and down and rapid response for all attacked nodes in the microservice system.

[0063] The step (5), strategy iteration and optimization, is as follows: After deploying the defenders obtained in steps (2) and (3) in the actual environment, the defense strategy needs to be fine-tuned and updated based on the actual defense effect of the system. That is, when a new attack mode appears in the system, the defense effect continues to decline, the service topology is adjusted, etc., the incremental training process is triggered. Lightweight alternating training is performed based on the original network parameters. The latest defense strategy is obtained by updating some network weights. This closed-loop optimization mechanism can avoid strategy oscillation and achieve adaptive defense capability.

[0064] Specifically, for step 1: resource monitoring and environmental modeling, the details are as follows:

[0065] Step (1.1) Metrics Collection and Reporting: Use the K8S Metrics API to report the resource information of nodes and services, which can be represented as resource vectors. Simultaneously, service resource vectors are collected. The resource consumption of microservice nodes is related to the amount of traffic, and different services consume different resources. The real-time resource vector is represented as Additionally, the number of copies Data can also be collected via the Kubernetes Metrics API; additionally, custom metric probes can be deployed within the service to collect other data.

[0066] Step (1.2) Metrics Collection and Aggregation: The latency is obtained through aggregation using Prometheus. Transmission rate Request failure rate By aggregating metrics, services can be obtained. observation vector representation ,

[0067] Step (1.3) Normalization to resource vector: The aggregated service observation metrics are normalized, and the observation vector is... The number of observation indicators is ,

[0068] Step (1.4) State Matrix Modeling: By periodically collecting and processing data, a real-time microservice state matrix representation is obtained. , Indicates service At this time, the observation indicators ,

[0069] Specifically, the defense algorithm based on MDDQN for step 2 is as follows:

[0070] Step (2.1) State-space encoding: MDDQN consists of a shared layer and a multidimensional output head, which encodes the system state matrix obtained in step (1). Input a shared feature extraction layer to extract a high-dimensional state representation.

[0071] Step (2.2) Independent Output by Multiple Heads: The system state matrix is encoded as input through a shared feature extraction layer. Then, an independent output head is set for each action dimension, with each output head corresponding to one dimension of the action space. The decision output is a multi-dimensional action vector. The network output of MDDQN is represented as follows: ,in, This indicates the state action to be taken under the k-th resource dimension. Expected value

[0072] Step (2.3) Multidimensional Action Output: In terms of strategy selection, the following approach is adopted. A greedy exploration strategy selects a random action independently for each dimension when the exploration rate is high, and selects the current optimal action independently for each dimension when the exploration rate is low. ,Right now ,

[0073] Step (2.4) Multidimensional loss fusion: Calculate the Q-value for each output head and obtain the loss for each dimension. , Indicates the first Dimensional loss, This represents the target value in the k-th dimension, used to guide model updates. The losses from each dimension are summed and averaged to form the mean loss for backpropagation. The formula for calculating the mean loss is: This enables rapid convergence of multidimensional scaling decisions.

[0074] Specifically, step 3, adversarial training based on zero-sum games, is as follows:

[0075] Step (3.1) Modeling and Model Selection of Attackers and Defenders: First, the attack and defense process of the microservice system is modeled as a zero-sum game problem. The attacker, as an adversarial agent, dynamically selects the attack target and attack intensity based on the environmental state and defense strategy, thereby perturbing the system. The defender learns and optimizes the defense strategy to ensure system security and service quality. To ensure fairness, both attackers and defenders use the MDDQN algorithm for driving the process. The state matrices of both sides are... The same, obtained from the data collection and modeling in step (1),

[0076] Step (3.2) Action Space Construction: For attackers, they can dynamically change the attack path and attack intensity. RDoS attackers launch attacks by selecting the service interfaces exposed by the microservice system, therefore the attack path... , This represents the number of selectable attack paths, and for each attack path, a certain intensity of attack traffic can be launched. Each attacker has a maximum limit on the total attack resources. ,Right now Therefore, the attacker's action space can be represented as , Indicates path The attack traffic is Defenders effectively resist attacks by deciding on multi-dimensional scaling thresholds, i.e., for resource vectors... For each dimension, determine a scaling threshold vector. The threshold vector dimension is The range of values for each dimension is: Its action space can be represented as ,

[0077] Step (3.3) Based on the zero-sum game reward function: Construct the defender's reward function based on system security indicators and service quality. It consists of multiple weighted components, namely, security reward, number of dangerous services, link failure rate, number of idle replicas, and action execution penalty. The attacker's reward function is the inverse of the defender's reward function. This means that a positive gain for one side will inevitably lead to a negative gain for the other side, satisfying the zero-sum game constraint.

[0078] Step (3.4) Attack and defense confrontation training: Both sides adopt an alternating update strategy. First, the attacker iterates continuously. The defender iterates continuously, conducting adversarial training to update its attack strategy. The training alternates between cycles, constantly updating defensive strategies to adapt to attacks, for a total of [number missing] cycles. This process continues until both sides' strategies converge, and finally the defender converges to obtain a robust optimal defense model.

[0079] Specifically, for step 4: scaling up and down deployment based on multiple resource thresholds, the details are as follows:

[0080] Step (4.1) Multidimensional action vector input: First, the defense action vector obtained from the decision in step (3) is processed. As input to this module, each resource dimension is a threshold vector. , respectively representing the bottleneck threshold and target threshold The former is used to determine whether a service needs to be scaled up or down, while the latter represents the resource threshold that the service is expected to reach after scaling up or down.

[0081] Step (4.2) Scaling-up and scaling-down strategies for each resource dimension: For each resource dimension, analyze the bottleneck threshold and target threshold, and first determine the scaling-up and scaling-down actions based on the relationship between real-time observed indicators and the target threshold. If all real-time observed metrics across all service dimensions are below the bottleneck threshold, no action will be taken. If the real-time observed index is less than or equal to the target threshold, then the capacity will be expanded. expansion factor Otherwise, shrinkage Reduction factor ,

[0082] Step (4.3) Service Dimension Scaling / Scaling Strategy: At the service level, aggregate scaling / scaling decisions for each resource dimension. That is, if any resource dimension needs scaling / scaling, then a scaling / scaling action is executed. The maximum scaling / scaling multiple of all resource dimensions is used as the service scaling / scaling multiple. The reduction factor is ,

[0083] Step (4.4) Service Deployment Configuration Update and Distribution: Based on the new number of replicas, generate new Deployment deployment configurations for all services that need to be scaled up or down, and then distribute them to the microservice control node for execution. This enables coordinated scaling up and down and rapid response for all attacked nodes in the microservice system.

[0084] Specifically, for step 5: strategy iteration and optimization, the details are as follows:

[0085] Step (5.1) Defense Effectiveness and System Status Monitoring: Deploy the defense strategies obtained through steps (2) and (3) in the actual operating environment, continuously monitor system security and service quality indicators, and trigger the strategy optimization process when new attack patterns are detected, defense effectiveness declines, or service topology is adjusted.

[0086] Step (5.2) Lightweight training for model update: Incremental lightweight alternating training is performed based on the original model parameters, updating only some network weights to obtain the latest defense strategy, thereby achieving closed-loop optimization, avoiding policy oscillation and improving the system's adaptive defense capability.

[0087] It should be noted that the above embodiments are not intended to limit the scope of protection of the present invention. Equivalent transformations or substitutions made based on the above technical solutions all fall within the scope of protection of the claims of the present invention.

Claims

1. A proactive defense method against adversarial denial-of-service attacks in a microservice scenario, characterized by: The defense method includes the following steps: Step (1): Resource monitoring and environmental modeling, Step (2): Defense algorithm based on MDDQN, Step (3): Adversarial training based on zero-sum game. Step (4): Scaling up and down deployment based on multiple resource thresholds. Step (5): Strategy iteration and optimization; Step (1) – Resource monitoring and environment modeling – involves deploying an observation metric collection application for each node and service replica in the microservice system. This application uses the K8S Metrics API and Istio's native metric collection interface, or a custom metric collection probe, to report the observed metrics of the deployed application, including resource usage, latency, and packet loss information. Then, a centralized processing program periodically collects and aggregates the observed metrics of all replicas of all services to obtain the service's total resource usage, average latency, and request failure rate, and normalizes them into resource vector form. Finally, all service node metric vectors are modeled as a state matrix representation of the microservice system at the current moment. Step (2): The defense algorithm based on MDDQN is as follows: The MDDQN method encodes the system state matrix into a shared feature extraction layer, and then sets an independent output head for each action dimension. Each output head corresponds to one dimension of the action space to make a decision to output a multi-dimensional action vector; in terms of strategy selection, it adopts... The greedy exploration strategy selects random actions independently for each dimension at a high exploration rate, and selects the action with the largest Q-value in each dimension at a low exploration rate. For loss propagation, each output head calculates its Q-value and obtains the loss for each dimension, then uses the mean of all dimensions as the total loss for backpropagation. This achieves rapid convergence of multi-dimensional scaling decisions. Step (3): Adversarial training based on zero-sum game, specifically as follows: First, the attack and defense adversarial process of the microservice system is modeled as a zero-sum game problem. The attacker, as an adversarial agent, dynamically selects the attack target and attack intensity according to the environmental state and defense strategy, and applies perturbation to the system. The defender learns and optimizes the defense strategy to ensure system security and service quality. The system state matrix obtained in step (1) is used as the input of both the attacker and the defender, and the current action decision is output through the action state transition function. Different action spaces need to be constructed for the attacker and the defender. Finally, the defender's reward function is constructed based on the system security indicators and service quality. The attacker's reward function is the inverse of the defender's reward function. The attacker and the defender adopt an alternating update strategy. In a specified round, one side's strategy is fixed, and the other side's strategy function is iterated. This alternating update is repeated. After multiple rounds of alternating iteration, the defender obtains the robust optimal defense strategy. Step (4): Scaling up and down deployment based on multiple resource thresholds, specifically as follows. This method not only considers a single CPU or memory load, but also integrates multiple resource thresholds to construct a joint judgment mechanism. First, the defense action vector obtained from step (3) is used as the input of the scaling up and down deployment module. For each resource dimension, the bottleneck threshold and the target threshold are first parsed. The former is used to determine the service that needs to be scaled up or down, and the latter represents the resource threshold that the service is expected to reach after scaling up or down. Then, the scaling up and down strategy is decided according to the joint decision of multiple resource thresholds. For each service, the relationship between the bottleneck threshold and the target threshold is first used to determine whether to scale up or down. The system first scales down, then determines whether to scale up or down based on the real-time resource observation metrics and bottleneck thresholds obtained in step (1). It then calculates the scaling factor based on the ratio of the bottleneck threshold to the target threshold. The results from multiple resource dimensions are aggregated to obtain the service scaling action. That is, if any resource dimension needs scaling up or down, the scaling action is executed, and the maximum scaling factor across all resource dimensions is used as the service scaling factor. Finally, a new deployment configuration is generated for all services requiring scaling up or down, and then distributed to the microservice control node for execution. This achieves coordinated scaling up and down and rapid response for all attacked nodes in the microservice system. Step (5): Strategy iteration and optimization, specifically as follows: After deploying the defenders obtained in steps (2) and (3) in the actual environment, it is necessary to fine-tune and update the defense strategy based on the actual defense effect of the system. That is, when a new attack mode appears in the system, the defense effect continues to decline, or the service topology is adjusted, the incremental training process is triggered. Lightweight alternating training is performed based on the original network parameters, and the latest defense strategy is obtained by updating some network weights. This closed-loop optimization mechanism can avoid strategy oscillation and achieve adaptive defense capability.

2. The proactive defense method against adversarial denial-of-service attacks in a microservice scenario according to claim 1, characterized in that, Step 1, resource monitoring and environmental modeling, is carried out as follows: Step (1.1) Metric Collection and Reporting: Deploy the observation metric collection application on each node and its service replicas of the microservice system. The set of microservice system nodes can be represented as follows: , This refers to the number of nodes; the data collection application is implemented through the Kubernetes Metrics API, the Istio native metrics interface, or a custom probe, and is used to collect operational metrics including resource usage, latency, and packet loss rate. Step (1.2) Metric Collection and Aggregation: A centralized processing program is used to collect operational metrics from all nodes and service replicas at preset intervals. Data from multiple replicas of the same service is aggregated and calculated using three aggregation methods. To serve the number of copies, Indicates the service number The observation metrics for each replica are first aggregated using an additive method for total resource usage, i.e., for a specific service resource usage metric. For average latency, the average value is aggregated, i.e., the average latency index of a certain service. For request failure rate, a weighted average aggregation is used, i.e., the failure rate metric for a specific service request. The number of requests for each replica will be included. As weights, the observation indicators from each dimension are finally aggregated to obtain the service observation indicator vector. The total number of observed indicators is , Step (1.3) Normalize to resource vector: normalize the aggregated service observation index vector Normalization is performed using the following formula: , These represent the observed indicators respectively. The maximum and minimum values that can be taken are used to generate the observation vector of the service. , Step (1.4) State Matrix Modeling: Combine the resource vectors of all service nodes to form the system state matrix. This describes the overall operational status of the microservice system at the current moment. Value is a service At this time, the observation indicators .

3. The proactive defense method against adversarial denial-of-service attacks in a microservice scenario according to claim 1, characterized in that, Step 2: The defense algorithm based on MDDQN, the specific process is as follows: Step (2.1) State-space encoding: MDDQN consists of a shared layer and a multidimensional output head, which encodes the system state matrix obtained in step (1). Input a shared feature extraction layer to extract a high-dimensional state representation. Step (2.2) Independent Output by Multiple Heads: The system state matrix is encoded as input through a shared feature extraction layer. Then, an independent output head is set for each action dimension, with each output head corresponding to one dimension of the action space. The decision output is a multi-dimensional action vector. The network output form of MDDQN is as follows: ,in, Indicates taking a state action pair Expected value Indicates the current environmental state. Indicates the current state of the first Dimensional action value, Step (2.3) Multidimensional Action Output: In terms of strategy selection, the following approach is adopted. A greedy exploration strategy selects a random action independently for each dimension when the exploration rate is high, and selects the current optimal action independently for each dimension when the exploration rate is low. ,Right now , Step (2.4) Multidimensional Loss Fusion: Calculate the Q-value for each output head and obtain the loss for each dimension. The calculation formula is as follows: , Indicates the first Loss of movement, This represents the target value of the k-th dimension action, used to guide model updates. The losses from each dimension are accumulated and averaged to form the mean loss for backpropagation. The formula for calculating the mean loss is... This enables rapid convergence of multidimensional expansion and contraction decisions.

4. The proactive defense method against adversarial denial-of-service attacks in a microservice scenario according to claim 1, characterized in that, Step 3: Adversarial training based on zero-sum game, the specific process is as follows: Step (3.1) Modeling and Model Selection of Attackers and Defenders: First, the attack and defense process of the microservice system is modeled as a zero-sum game problem. The attacker, as an adversarial agent, dynamically selects the attack target and attack intensity based on the environmental state and defense strategy, thereby perturbing the system. The defender learns and optimizes the defense strategy to ensure system security and service quality. To ensure fairness, both attackers and defenders use the MDDQN algorithm for driving the process. The state matrices of the attackers and defenders are shown in the figure. The same, obtained from the data collection and modeling in step (1), Step (3.2) Action Space Construction: Resource Exhaustion Denial-of-Service (RDoS) attackers launch attacks by selecting service interfaces exposed by the microservice system, and simultaneously change their countermeasures by dynamically altering the attack path and attack intensity. The number of selectable attack paths is... Each attack path can launch attack traffic of a certain intensity, and each attacker has a maximum limit on the total attack resources available. Therefore, the attacker's action space is represented as , Indicates path The size of the attack traffic initiated; defenders effectively resist attacks by deciding on multi-dimensional scaling thresholds, i.e., for resource vectors. For each dimension, determine a scaling threshold vector. The threshold vector dimension is The range of values for each dimension is: Its action space can be represented as , Step (3.3) Based on the zero-sum game reward function: Construct the defender's reward function based on system security indicators and service quality. It consists of multiple weighted components, namely, security reward, number of dangerous services, link failure rate, number of idle replicas, and action execution penalty. The attacker's reward function is the inverse of the defender's reward function. This means that a positive gain for one side will inevitably lead to a negative gain for the other side, satisfying the zero-sum game constraint. Step (3.4) Attack and defense confrontation training: Both sides adopt an alternating update strategy. First, the attacker iterates continuously. The defender iterates continuously, conducting adversarial training to update its attack strategy. The training alternates between cycles, constantly updating defensive strategies to adapt to attacks, for a total of [number missing] cycles. This process continues until both sides' strategies converge, and finally the defender converges to obtain a robust optimal defense model.

5. The proactive defense method against adversarial denial-of-service attacks in a microservice scenario according to claim 1, characterized in that, Step 4: Scaling up and down deployment based on multiple resource thresholds, the specific process is as follows: Step (4.1) Multidimensional action vector input: First, the defense action vector obtained from the decision in step (3) is processed. As input to the scaling deployment module, each resource dimension is a threshold vector. , respectively representing the bottleneck threshold and target threshold The former is used to determine whether a service needs to be scaled up or down, while the latter represents the resource threshold that the service is expected to reach after scaling up or down. Step (4.2) Scaling-up and scaling-down strategies for each resource dimension: For each resource dimension, analyze the bottleneck threshold and target threshold, and first determine the scaling-up and scaling-down actions based on the relationship between real-time observed indicators and the target threshold. If all real-time observed metrics across all service dimensions are below the bottleneck threshold, no action will be taken. If the real-time observed index is less than or equal to the target threshold, then the capacity will be expanded. Expected expansion factor of observed indicators Otherwise, shrinkage Reduction factor , Step (4.3) Service Dimension Scaling / Scaling Strategy: At the service level, aggregate scaling / scaling decisions for each resource dimension. That is, if any resource dimension needs scaling / scaling, then a scaling / scaling action is executed. The maximum scaling / scaling multiple of all resource dimensions is used as the service scaling / scaling multiple. The reduction factor is , Step (4.4) Service deployment configuration update and distribution: Based on the new number of replicas, generate new Deployment deployment configurations for all services that need to be scaled up or down, and then distribute them to the microservice control node for execution, so as to realize the coordinated scaling up and down and rapid response of all attacked nodes in the microservice system.

6. The proactive defense method against adversarial denial-of-service attacks in a microservice scenario according to claim 1, characterized in that, Step 5: Strategy iteration and optimization, the specific process is as follows: Step (5.1) Defense Effectiveness and System Status Monitoring: Deploy the defense strategies obtained through steps (2) and (3) in the actual operating environment, continuously monitor system security and service quality indicators, and trigger the strategy optimization process when new attack patterns are detected, defense effectiveness declines, or service topology is adjusted. Step (5.2) Lightweight training for model update: Incremental lightweight alternating training is performed based on the original model parameters, updating only some network weights to obtain the latest defense strategy, thereby achieving closed-loop optimization, avoiding policy oscillation and improving the system's adaptive defense capability.

7. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that: When the processor executes the program, it implements the proactive defense method against adversarial denial-of-service attacks in a microservice scenario as described in any one of claims 1 to 6 above.

8. A computer-readable storage medium storing computer instructions thereon, characterized in that: When executed by the processor, the computer instruction implements the proactive defense method against adversarial denial-of-service attacks in a microservice scenario as described in any one of claims 1-6.