An adaptive network security monitoring method and system against driving and cross-modal fusion

By generating adversarial examples through a collaborative framework of adversarial generative networks and reinforcement learning, and combining multimodal graph neural networks and deep reinforcement learning, we have achieved efficient detection and automated defense against advanced persistent threats, which addresses the shortcomings of existing network security monitoring systems and improves detection accuracy and response speed.

CN122226352APending Publication Date: 2026-06-16HUBEI POLYTECHNIC UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HUBEI POLYTECHNIC UNIV
Filing Date
2026-03-11
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing network security monitoring systems lack the ability to detect unknown threats, struggle to cope with advanced persistent threats, and lack effective multi-source data fusion and intelligent adaptive defense mechanisms, resulting in low detection accuracy, high false alarm rate, and slow response speed.

Method used

Adversarial examples are generated using a collaborative framework of adversarial generative networks and reinforcement learning. Cross-modal feature extraction and threat detection are performed through multimodal graph neural networks, and automated closed-loop defense is achieved using deep reinforcement learning, including dynamically isolating affected assets and deploying decoy systems.

🎯Benefits of technology

It improves the detection accuracy and response efficiency of advanced persistent threats, reduces the false alarm rate, achieves millisecond-level automated response from detection to policy adjustment, has dynamic adaptive capabilities, and can proactively respond to unknown threats.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122226352A_ABST
    Figure CN122226352A_ABST
Patent Text Reader

Abstract

The present application relates to the technical field of network security, in particular to an adaptive network security monitoring method and system driven by confrontation and cross-modal fusion. Through the collaborative framework of generative adversarial network and reinforcement learning, advanced persistent threat adversarial samples are dynamically generated and the attack-defense response process is simulated; the network traffic, system logs and asset topology are unified graph modeled and feature fused by using the multi-modal graph neural network, so that accurate threat detection is realized; the adversarial training mechanism is introduced, and the robustness of the detection model to hidden and variant attacks is improved by using the adversarial samples; the deep reinforcement learning agent is deployed, and the detection results are automatically learned and defense strategies such as dynamic isolation and decoy deployment are executed, forming an automated closed-loop defense. The present application actively improves the defense capability by confrontation driving, deepens the threat perception by cross-modal fusion, and realizes intelligent adaptive response by reinforcement learning, effectively solving the problems of passive lag, high false alarm rate and slow response of traditional security monitoring.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of technology, and specifically to an adaptive network security monitoring method and system that integrates adversarial driving and cross-modal fusion. Background Technology

[0002] With the continuous evolution of cyberattack techniques, sophisticated attack methods such as advanced persistent threats (APTS) are becoming increasingly rampant, posing a severe challenge to traditional network security monitoring systems. Existing technologies primarily rely on signature-based detection systems and rule-based defense strategies, but these methods have significant limitations. First, they are inherently passive and lagging, heavily dependent on updates to known attack signatures, making them ill-equipped to deal with zero-day vulnerability attacks or carefully disguised, continuously evolving unknown threats. Attackers can easily bypass detection through obfuscation, encryption, or mimicking normal traffic. Second, current security monitoring systems often perform isolated analysis of multi-source heterogeneous data such as network traffic, host logs, and asset topology, lacking effective deep fusion mechanisms. This "data silo" phenomenon makes it difficult to discover covert attack clues across systems and stages with low signal-to-noise ratios from a global perspective, resulting in low detection accuracy and high false positive rates. Finally, there is a disconnect between detection and response. Most systems still require security analysts to conduct manual assessments and response decisions after alerts, leading to slow response times and low efficiency, failing to meet the need for immediate and automatic containment of rapidly spreading cyberattacks. While existing research has attempted to apply machine learning to threat detection, the models themselves are vulnerable to adversarial attacks, lack robustness, and have failed to form an intelligent adaptive closed loop from perception and decision-making to execution. Therefore, there is an urgent need for a new network security monitoring solution that can proactively respond to unknown threats, deeply integrate multi-source information, and achieve intelligent and automated closed-loop defense.

[0003] Therefore, the existing technology still needs further development. Summary of the Invention

[0004] The purpose of this invention is to overcome the above-mentioned technical deficiencies and provide an adaptive network security monitoring method and system that integrates adversarial driving and cross-modal fusion, so as to solve the problems existing in the prior art.

[0005] To achieve the above-mentioned technical objectives, according to a first aspect of the present invention, the present invention provides an adaptive network security monitoring method that integrates adversarial driving and cross-modal fusion, comprising:

[0006] S1. Dynamically generate adversarial examples of advanced persistent threats and simulate attack responses based on response strategies through a collaborative framework of adversarial generative networks and reinforcement learning.

[0007] S2. Utilize multimodal graph neural networks to uniformly model network traffic data, system log data, and asset topology data, and extract cross-modal features and perform threat detection through graph convolution operations;

[0008] S3. Introduce an adversarial training mechanism and use the adversarial examples to perform adversarial training on the multimodal graph neural network to enhance the model's robustness against covert and mutation attacks.

[0009] S4. Deploy a deep reinforcement learning agent to automatically learn and execute defense strategy adjustment actions based on the threat detection results, thereby achieving automated closed-loop defense from detection to strategy adjustment. The defense strategy adjustment actions include dynamically isolating affected assets and deploying decoy systems.

[0010] Specifically, the collaborative framework of adversarial generative networks and reinforcement learning in S1 includes:

[0011] The generator network receives random noise vectors and current network state features as input and outputs adversarial traffic samples simulating advanced persistent threats. The discriminator network distinguishes between real network traffic samples and the generated adversarial traffic samples and outputs the discrimination probability. The reinforcement learning agent interacts with the generator network and the discriminator network, using the adversarial traffic samples as a simulated attack environment, and learns the optimal response strategy through a state-action-reward loop. The response strategy includes real-time blocking of malicious connections, traffic redirection to a sandbox, or triggering an alarm escalation mechanism. The collaborative framework dynamically optimizes the adversarial sample generation and response strategy by alternately training the generator, discriminator, and reinforcement learning agent.

[0012] Specifically, the generator network adopts a deep neural network architecture, including multiple fully connected layers and convolutional layers, for learning advanced attack pattern features from random noise and network state; the discriminator network includes a graph attention layer and a fully connected classification layer, for combining the output of the multimodal graph neural network to distinguish samples; the reinforcement learning agent is designed based on a deep deterministic policy gradient algorithm, its state space is jointly composed of the threat detection output of the multimodal graph neural network and the network topology state, its action space includes various defensive operations, and the reward function is dynamically adjusted based on the attack mitigation effect and false positive penalty to achieve continuous policy optimization.

[0013] Specifically, the process of generating adversarial examples further includes:

[0014] Adversarial traffic samples output by the generator network are injected into real network traffic data through adversarial perturbations to form a hybrid training dataset. The discriminator network is used to evaluate the hybrid dataset to generate an adversarial loss signal. Combined with the reward signal of the reinforcement learning agent, the parameters of the generator network are optimized to generate more realistic and diverse attack samples, thereby improving the generalization ability of the multimodal graph neural network in adversarial training and enhancing its robustness in detecting unknown threats.

[0015] Specifically, the unified modeling of multimodal graphical neural networks in S2 includes:

[0016] Network traffic data is transformed into graph node features, system log data into time-series edge features, and asset topology data into graph structure connections. Node features are aggregated through graph convolutional network layers, and feature representations from different modalities are weighted and fused using a graph attention mechanism. A cross-modal fusion module is used to concatenate or weightedly sum the feature vectors of traffic, logs, and topology to generate a unified graph embedding representation. Based on this graph embedding representation, threat detection is performed using a multilayer perceptron classifier, outputting attack probability and type labels.

[0017] Specifically, the cross-modal feature fusion method further includes:

[0018] Unsupervised pre-training of multimodal graph data is performed using a graph autoencoder to learn latent feature representations; a gating mechanism is introduced during the fusion process to dynamically adjust the contribution weights of each modality feature; global graph-level features are extracted through graph pooling operations and combined with local node features for multi-scale threat analysis; and temporal graph neural networks are used to process the sequence dependencies of log data to capture the temporal evolution patterns of attack behavior, thereby improving the detection accuracy and context awareness of advanced persistent threats.

[0019] Specifically, the adversarial training mechanism in S3 includes:

[0020] When training a multimodal graph neural network, real network data and generated adversarial example data are used alternately; an adversarial loss function is designed, combining classification loss and adversarial perturbation loss, and the network parameters are updated through gradient backpropagation; an adversarial perturbation method is used to generate adversarial perturbations to ensure that adversarial examples are within the effective attack range; through iterative training, the multimodal graph neural network becomes insensitive to input perturbations while maintaining high detection accuracy, thereby enhancing the model's robustness against covert attacks and adversarial evasion techniques.

[0021] Specifically, the design of the deep reinforcement learning agent in S4 includes:

[0022] The agent's state space is defined as the threat detection output, real-time network performance metrics, and asset topology changes of the multimodal graph neural network; the action space includes a combination of actions such as dynamic isolation, decoy deployment, traffic rate limiting, policy routing adjustment, and alarm notification; the reward function is calculated based on a comprehensive consideration of threat mitigation speed, resource overhead, and false alarm rate; the agent uses a near-end policy optimization algorithm to update its policies, and continuously optimizes the policy network and value network by interacting with the environment to achieve adaptive closed-loop defense.

[0023] Specifically, further optimization of the defense strategy adjustment actions includes:

[0024] Based on the probability distribution of actions output by the deep reinforcement learning agent, the optimal defense combination is selected; dynamic isolation actions are implemented by modifying flow table rules in real time through a software-defined network controller to isolate infected assets; decoy deployment actions are automatically generated and deployed to a highly interactive honeypot system to mislead attackers and collect attack intelligence; a security policy consistency check is introduced during the policy adjustment process to ensure that defense actions do not disrupt normal network services; and the defense effectiveness evaluation is used as a reward signal to input into the agent through a feedback loop to continuously improve the policy to adapt to the ever-changing threat environment.

[0025] According to a second aspect of the present invention, an adaptive network security monitoring system integrating adversarial driving and cross-modal fusion is provided, comprising:

[0026] The adversarial generation and reinforcement learning collaborative module is used to dynamically generate adversarial samples of advanced persistent threats through adversarial generation networks and combine them with reinforcement learning to simulate the attack response process.

[0027] The multimodal graph neural network modeling module is used to perform unified graph structure modeling on network traffic data, system log data, and asset topology data, extract cross-modal features, and perform threat detection.

[0028] An adversarial training module is used to perform adversarial training on the multimodal graph neural network using the adversarial examples, thereby enhancing the model's robustness against covert attacks.

[0029] The deep reinforcement learning agent module is used to automatically learn and execute defense strategy adjustment actions based on threat detection results, thereby achieving automated closed-loop defense.

[0030] The modules work together to achieve adaptive network security monitoring through adversarial driving and cross-modal fusion.

[0031] Beneficial effects:

[0032] The adaptive network security monitoring method and system with adversarial driving and cross-modal fusion provided by this invention can bring many significant benefits compared with the prior art.

[0033] First, this invention constructs an "adversarial-driven" proactive defense system, fundamentally changing the traditional passive defense model. By integrating adversarial generative networks and reinforcement learning into a collaborative framework, the system can dynamically generate diverse adversarial samples simulating advanced persistent threats. These samples not only continuously "challenge" and enhance the robustness of the detection model, giving it a stronger generalization ability to identify new, variant, and covert attacks, but also serve as a "training" environment for reinforcement learning agents, enabling defense strategies to autonomously learn and evolve in simulated high-intensity adversarial scenarios. This mechanism gives the system a proactive immune capability of "promoting defense through offense and mutual reinforcement between offense and defense," significantly improving the level of early warning and defense against unknown threats.

[0034] Second, this invention achieves deep semantic fusion and correlation analysis of multi-dimensional data in cyberspace through unified modeling based on multimodal graph neural networks. By mapping heterogeneous data such as traffic, logs, and topology to a unified graph structure for representation learning, it can naturally depict the complex interaction relationships between entities and attack propagation paths. The application of graph convolution and attention mechanisms enables the model to accurately capture local anomalies and global patterns, effectively identifying low-rate, slow-spreading advanced attack activities that are difficult to detect using traditional methods. This significantly improves the accuracy of threat detection and early detection capabilities, while reducing false positives and false negatives.

[0035] Third, this invention utilizes deep reinforcement learning to achieve a seamless closed loop from intelligent detection to automatic decision-making and response. Based on a reinforcement learning agent with comprehensive threat situational awareness, it can comprehensively consider security effectiveness and business costs, automatically learning and outputting optimal combined defense strategies such as dynamic isolation and decoy deployment. This system not only achieves millisecond-level automation of the response process, shortening threat dwell time, but also, through continuous environmental feedback and strategy optimization, enables context-adaptive defense behavior, flexibly responding to dynamic changes in network topology and business load, maximizing business continuity while ensuring security.

[0036] Fourth, overall, this invention organically integrates adversarial generation, multimodal fusion, adversarial training, and reinforcement learning decision-making to form a self-evolving and continuously optimizing intelligent security monitoring ecosystem. The components mutually reinforce each other, enabling the system's detection accuracy, robustness, decision intelligence, and response efficiency to continuously improve through iterative iterations, providing an effective technical path for building a dynamic, proactive, and adaptive next-generation cybersecurity protection system. Attached Figure Description

[0037] Figure 1 This is a flowchart illustrating the adaptive network security monitoring method based on adversarial driving and cross-modal fusion provided in a specific embodiment of the present invention. Detailed Implementation

[0038] To enable those skilled in the art to better understand the technical solutions of the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Based on the embodiments in this application, other similar embodiments obtained by those skilled in the art without creative effort should all fall within the scope of protection of this application. Furthermore, directional terms mentioned in the following embodiments, such as "up," "down," "left," and "right," are only for reference to the directions in the accompanying drawings; therefore, the directional terms used are for illustrative purposes and not for limiting the invention.

[0039] The present invention will be further described below with reference to the accompanying drawings and preferred embodiments.

[0040] Please see Figure 1 This invention provides an adaptive network security monitoring method that integrates adversarial driving and cross-modal fusion, comprising:

[0041] S1. Adversarial examples of advanced persistent threats are dynamically generated through a collaborative framework of adversarial generative networks and reinforcement learning, and attack responses based on response strategies are simulated.

[0042] It should be further explained that the core of this invention lies in constructing an adaptive network security monitoring closed loop that combines "adversarial driving" and "cross-modal fusion." The starting point of this method is S1, namely, constructing a collaborative framework. The core components of this framework include an adversarial generative network (GAN) and a reinforcement learning agent. The GAN, particularly its generator part, is used to proactively generate adversarial network traffic packet sequences that simulate advanced persistent threat (APS) behaviors. These sequences are not simple random noise, but synthetic data that has learned the statistical characteristics and protocol semantics of real attack traffic. Simultaneously, the reinforcement learning agent is placed in a simulated network defense environment, the state of which is driven by the adversarial sample attack postures generated by the generator. The agent's goal is to learn how to choose the optimal response action when facing these simulated attacks, such as immediate blocking, observation and analysis, or decoy.

[0043] S2. Utilize multimodal graph neural networks to uniformly model network traffic data, system log data, and asset topology data, and extract cross-modal features through graph convolution operations for threat detection.

[0044] It should be further explained that S2 is the core of multimodal data fusion and modeling. Network traffic data, system logs, and asset topology relationships are inherently heterogeneous. This invention models them using a unified framework of graph neural networks: each host, server, or network device in the network is considered a node in the graph, and the interaction relationships of network connections or log records are considered edges. The characteristics of traffic packets (such as protocol type, packet size, and frequency) are encoded as node features, the event sequences in the system logs are transformed into edge features with timestamps or time-series node features, and the asset topology (such as VLAN segmentation and access control lists) directly defines the adjacency matrix or connection relationships of the graph. Through graph convolution operations, nodes can aggregate information from their neighbors (i.e., other assets with which they communicate or have logical connections), thereby achieving deep fusion of traffic, logs, and topology information, extracting complex attack pattern features that cannot be captured by single-modal data, and using them for high-precision threat detection.

[0045] S3. Introduce an adversarial training mechanism and use the adversarial samples to perform adversarial training on the multimodal graph neural network to enhance the model's robustness against covert and mutation attacks.

[0046] It should be further explained that S3 is key to improving the model's robustness. Traditional threat detection models experience a sharp decline in performance when encountering carefully crafted attack samples designed to evade detection. This invention incorporates adversarial samples dynamically generated in S1 into the training dataset of the multimodal graph neural network. Through adversarial training, the model is forced to learn the essential characteristics of the attack rather than superficial statistical patterns, thus making it insensitive to small adversarial perturbations in the input data and significantly improving its ability to detect new, variant, or covert attacks.

[0047] S4. Deploy a deep reinforcement learning agent to automatically learn and execute defense strategy adjustment actions based on the threat detection results, thereby achieving automated closed-loop defense from detection to strategy adjustment. The defense strategy adjustment actions include dynamically isolating affected assets and deploying decoy systems.

[0048] It's important to further clarify that S4 is the final step in achieving the adaptive closed loop. The deep reinforcement learning agent, acting as the decision-making brain, receives its state awareness from the threat detection confidence, attack type, and location information output by the multimodal graph neural network in S2. Based on the current state, the agent outputs a defensive action, such as dynamically isolating suspected compromised hosts through flow table rules issued by a software-defined network controller, or automatically deploying a highly interactive honeypot system as bait in an idle network area. After executing the action, the environment (i.e., the actual network) generates new states and rewards (e.g., whether the attack was successfully contained, the impact of mis-isolation on business). This reward signal is fed back to the agent to update its policy network, thereby achieving continuous optimization and adaptive adjustment of the policy, forming a complete automated defense closed loop of "perception-decision-execution-feedback".

[0049] Understandably, this invention elevates the adversarial concept from simple model training to the system architecture level. It proactively creates "hypothetical enemies" for training purposes through adversarial generation, learns optimal defense strategies through reinforcement learning in confrontations with these "hypothetical enemies," and ensures the comprehensiveness and accuracy of "battlefield situation" perception through multimodal fusion. This design makes the entire system no longer static and passive, but possesses the ability to dynamically evolve and proactively adapt. It can effectively cope with the long-term, covert, and variable nature of advanced persistent threats, significantly reducing false positives and false negatives caused by reliance on fixed rules or a single data source, and greatly shortening the average time from threat detection to effective response, thus improving the agility and intelligence of overall security protection.

[0050] Specifically, the collaborative framework of adversarial generative network and reinforcement learning in S1 includes: a generator network receiving random noise vectors and current network state features as input, and outputting adversarial traffic samples simulating advanced persistent threats; a discriminator network distinguishing real network traffic samples from the generated adversarial traffic samples, and outputting a discrimination probability; and a reinforcement learning agent interacting with the generator network and the discriminator network, using the adversarial traffic samples as a simulated attack environment, and learning the optimal response strategy through a state-action-reward loop, wherein the response strategy includes real-time blocking of malicious connections, traffic redirection to a sandbox, or triggering an alarm escalation mechanism, and the collaborative framework dynamically optimizes the adversarial sample generation and response strategy by alternately training the generator, discriminator, and reinforcement learning agent.

[0051] It should be further explained that the collaborative framework in S1 is a dynamic system of ternary game and alternating training.

[0052] 1. Generator Networks The preferred structure is a deep neural network containing multiple fully connected layers and one-dimensional convolutional layers. Its input consists of two parts: a random noise vector sampled from a standard normal distribution. and a vector representing the current network baseline state. . This can include basic network traffic statistics (such as average packet rate, number of connections), summaries of deployed security policies, etc. Generator The output of the generator is a multi-dimensional vector. After deconvolution and reshaping operations, it generates a simulated network traffic sample with time-series characteristics (e.g., a traffic session feature sequence of fixed time length). The generator's goal is to make the generated sample... As close as possible to real advanced persistent threat traffic This "deceives" the discriminator.

[0053] 2. Discriminator Network The structure can be a classifier containing a graph attention layer and a fully connected layer. Its input is a traffic sample. (can be real) or generated Discriminator The output is a scalar probability value, representing the confidence level of the discriminator in classifying an input sample as genuine attack traffic. The goal of the discriminator is to accurately distinguish between real and fake samples.

[0054] 3. Reinforcement learning agent Based on the Actor-Critic architecture. The state it observes... This includes not only the current basic state of the network, but more importantly, it integrates the attack characteristics (such as attack stage and target port) exhibited by the adversarial traffic samples generated by the generator after preliminary analysis, as well as the discriminator's probability of classifying the sample. The proxy's action space. It can be discrete or hybrid, containing multiple response options. For example, actions. No action taken; continue to observe. : Block the network connection corresponding to the attack traffic in real time; Action Redirecting relevant traffic to a sandbox environment for in-depth behavioral analysis; Actions Trigger an advanced alert and notify the security administrator. The environment will reward the agent based on its actions. Reward function The design of the reward function is crucial, and a preferred reward function design is as follows: .in, This indicates the estimated reduction in the impact of the attack after taking action (such as successfully blocking critical penetration). This indicates the business interruption costs caused by actions taken due to false alarms. This represents the computational or network resources consumed in performing the action. These are weighting coefficients, and the preferred values ​​are respectively: These weight values ​​were chosen to ensure safety (high weight). While balancing the cost of false alarms (in the middle) ) and operational efficiency (low) The agent's goal is to maximize cumulative rewards. ,in It is a discount factor, usually set to .

[0055] The training process is alternating: First, with the generator and reinforcement learning agent fixed, the discriminator is trained to better distinguish between real data and data generated by the current generator. Then, with the discriminator and reinforcement learning agent fixed, the generator is trained to generate more realistic adversarial examples that can fool the current discriminator. Next, in a dynamically changing "attack environment" constructed by the current generator, the reinforcement learning agent is trained to learn the optimal response strategy. Finally, based on the agent's response performance and the discriminator's judgment, the generator is fine-tuned to generate more challenging examples. This cycle repeats, allowing the capabilities of all three components to evolve together.

[0056] Understandably, this collaborative framework creates an efficient "red-blue team" simulation environment. The generator acts as an ever-evolving "red team" (attacker), aiming to generate attack samples that are increasingly difficult to detect and counter; the reinforcement learning agent acts as the "blue team" (defender), learning the best defense strategy in the face of escalating challenges; and the discriminator acts as a "referee," driving technological progress for both sides. This avoids the limitations of relying on a fixed attack library for training, enabling the defense model to proactively adapt to unknown and new threats.

[0057] Specifically, the generator network adopts a deep neural network architecture, including multiple fully connected layers and convolutional layers, for learning advanced attack pattern features from random noise and network state; the discriminator network includes a graph attention layer and a fully connected classification layer, for combining the output of the multimodal graph neural network to distinguish samples; the reinforcement learning agent is designed based on a deep deterministic policy gradient algorithm, its state space is jointly composed of the threat detection output of the multimodal graph neural network and the network topology state, its action space includes various defensive operations, and the reward function is dynamically adjusted based on the attack mitigation effect and false positive penalty to achieve continuous policy optimization.

[0058] It should be further explained that the specific design and interaction of the generator network, discriminator network, and reinforcement learning agent are the technical foundation for realizing the collaborative framework.

[0059] 1. Generator Networks The preferred architecture is: an input layer (receiving the concatenated vector) ,in Dimension is 100. The generator (with a dimension of 50) is followed by three fully connected layers with 256, 512, and 1024 neurons respectively, all using the LeakyReLU activation function (with a negative slope of 0.2). This is followed by two one-dimensional transposed convolutional layers (or deconvolutional layers). The first layer uses 128 filters of size 5 and stride 2, and the second layer uses 64 filters of size 5 and stride 2, with the Tanh activation function constraining the output values ​​to the [-1, 1] interval. The final output layer reshapes the feature map into a matrix with the same dimension as the real traffic sample features (e.g., 64 features per time step across 100 time steps). This design allows the generator to progressively "generate" high-dimensional, structured attack traffic sequence features from low-dimensional noise and high-level state semantics.

[0060] 2. Discriminator Network The preferred architecture is as follows: Since traffic data can be naturally represented as a graph (e.g., with packets or streams as nodes and sequential or causal relationships as edges), a graph attention layer is introduced to capture the complex dependencies within the traffic. First, the input traffic sample is transformed into an initial graph representation. Then, it passes through a multi-head graph attention layer (GAT layer), with each head having 8 attention mechanisms to calculate the attention coefficients between nodes, using the following formula: ,in It is a node For nodes Attention coefficient It is a node feature. It is a shared linear transformation weight matrix. It is the weight vector of the attention mechanism. This indicates a splicing operation. It is a node The layer outputs the aggregated node features from the neighbor set. Next, a global pooling layer (such as max pooling) is used to obtain the global feature vector of the graph. Finally, this global feature vector is fed into two fully connected layers (64 and 1 neurons respectively), and the output layer uses the sigmoid activation function to obtain the probability that the sample is true. During training, the discriminator also receives some intermediate features from the multimodal graph neural network as auxiliary information to enhance its discriminative ability.

[0061] 3. Reinforcement learning agent The algorithm chosen is Deep Deterministic Policy Gradient (DDPG) because it is suitable for continuous action spaces. However, if the action space of this invention is discrete, Deep Q-Network (DQN) or its variants (such as DuelingDQN) can be used. State Space The specific composition is as follows: .in, It is a vector output by the multimodal graph neural network in S2 after performing threat detection on the current network snapshot, including the confidence score, threat level and affected asset identifier of various attacks. It is the network topology state vector, including the CPU / memory utilization of critical nodes, the bandwidth utilization of critical links, etc. Action space. It is a multidimensional discrete action vector, for example, Each representative determines whether it performs a specific defensive action (e.g., 0th: Isolation; 1st: Deploy decoy; 2nd: Rate limiting; 3rd: Alarm escalation). Reward function. The detailed calculation is as follows: .in, It is an indicator of attack mitigation effectiveness; for example, if the action successfully blocks an attack that has been identified as genuine, then... If the attack is successful and causes damage, then . This is a false alarm penalty indicator. For example, if the action targets a false alarm (subsequent verification shows it to be normal traffic), then... . It is the action overhead, which is quantified based on the resources consumed by the action (such as the service interruption time caused by isolation), and the value range is [0,1]. These are weighting coefficients, and the preferred values ​​are respectively .choose This is to emphasize that effective defense is more important than avoiding false alarms, which is in line with the principle of prioritizing security.

[0062] Understandably, such a specific and coordinated design of the generator, discriminator, and reinforcement learning agent ensures the quality of adversarial examples, the accuracy of discriminative capabilities, and the effectiveness of defense strategies. The generator's deep structure enables it to simulate complex attack patterns; the discriminator's graph attention mechanism allows it to analyze the internal structure of traffic more accurately; and the reinforcement learning agent's sophisticated reward function guides it to learn optimal decisions under complex trade-offs (security vs. business continuity).

[0063] Specifically, the adversarial example generation process further includes: injecting adversarial traffic samples output by the generator network into real network traffic data through adversarial perturbation to form a hybrid training dataset; evaluating the hybrid dataset using the discriminator network to generate an adversarial loss signal; and combining the reward signal of the reinforcement learning agent to jointly optimize the parameters of the generator network to generate more realistic and diverse attack samples, thereby improving the generalization ability of the multimodal graph neural network in adversarial training and enhancing its robustness to the detection of unknown threats.

[0064] It should be further explained that the generation of adversarial examples is not an isolated process, but is deeply integrated with the training objectives of the entire system. The specific steps are as follows:

[0065] 1. Construction of a hybrid dataset: A batch of real network traffic data is provided. This includes both normal traffic and known attack traffic. Generator network. Generate a batch of adversarial traffic samples Adversarial perturbation injection is not a simple replacement, but rather employs a hybrid strategy. For a real-world attack traffic sample... It can be done in a certain proportion (For example Features of generated adversarial examples Perform linear interpolation: At the same time, it will also They are directly added to the dataset as new samples, ultimately forming a mixed training set. .

[0066] 2. Adversarial loss calculation: This involves calculating the mixed dataset. Input discriminator network Generator The losses of the confrontation The goal is to minimize the probability that the generated samples will be detected by the discriminator. A commonly used loss function is the loss form of Wasserstein GAN: ,in Expressing expectations, It is the noise distribution. It represents the network state distribution. This loss encourages the generator to produce samples that the discriminator considers "very real".

[0067] 3. Policy Reward Guidance: Simple adversarial losses may cause the generator to fall into pattern collapse, generating only a few effective attack patterns. To encourage diversity, a reward for the reinforcement learning agent is introduced as additional guidance. When the generator produces an adversarial example... After being input into the simulation environment, the reinforcement learning agent It will take actions based on its strategy and receive environmental rewards. This reward This reflects the "challenge" that the adversarial example poses to the defense agent. For example, if the adversarial example successfully bypasses the initial detection and induces the agent to make incorrect decisions (such as false isolation), it may receive a high reward (a positive incentive for the generator). Therefore, the generator's overall optimization objective... It can be designed as: ,in It is a balance coefficient (preferred value is 0.1). This represents the average expected reward of the reinforcement learning agent when dealing with this batch of adversarial examples. It is optimized simultaneously through gradient descent. In addition to rewards, generators are encouraged to produce diverse attack samples that can both deceive the discriminator and create effective difficulties for defense strategies.

[0068] 4. Multimodal Graph Neural Network Enhancement: These optimized, high-quality adversarial examples It was used for adversarial training of multimodal graph neural networks. When training this detection model, an adversarial training regularization term was added to the loss function: ,in It is a classification loss (such as cross-entropy). This is the weight (preferred value is 0.5). This allows the detection model to not only learn to identify real attacks, but also to correctly classify these challenging adversarial examples, thereby improving its generalization ability and robustness to unknown threats.

[0069] Understandably, this generation process forms a self-reinforcing loop. The generator produces adversarial examples to improve the robustness of the detection model; the more robust detection model, in turn, forces the generator to produce even more difficult and covert examples; simultaneously, the reinforcement learning agent learns better defense strategies through interaction with these upgraded adversarial examples. This closed loop drives the continuous evolution of the entire system's security capabilities.

[0070] Specifically, the unified modeling of multimodal graph neural networks in S2 includes: converting network traffic data into graph node features, system log data into temporal edge features, and asset topology data into graph structure connections; aggregating node features through graph convolutional network layers and using graph attention mechanisms to weightedly fuse feature representations from different modalities; employing a cross-modal fusion module to concatenate or weightedly sum the feature vectors of traffic, logs, and topology to generate a unified graph embedding representation; and based on the graph embedding representation, performing threat detection through a multilayer perceptron classifier and outputting attack probability and type labels.

[0071] It should be further explained that S2 is the key to deep fusion of multimodal information and accurate threat detection, and its specific implementation process is as follows:

[0072] 1. Data Transformation and Graph Construction: First, define a heterogeneous information graph. Node set Represents assets in the network (hosts, servers, routers, etc.). Each node Associate an initial feature vector This feature vector primarily comes from network traffic data, such as the traffic statistics (total number of packets, total number of bytes, number of different protocols, connection failure rate, etc.) flowing to / out of the node within a set time window. These features are standardized to form the node features. Edge set Represents the interaction or connection between assets. Edge It can be established based on network connectivity (such as IP communication) or logical relationships (such as belonging to the same subnet). System log data is transformed into time-series edge features. For example, if from the host To host In time period If login failures, abnormal file access, or other log events are generated, the type, frequency, and severity of these events are encoded into a temporal feature vector and attached to the edge. Above. Asset topology data (such as network architecture diagrams and access control policies) are directly used to define or constrain the adjacency matrix of the graph. For example, the adjacency matrix is ​​set to 1 only between nodes that are physically connected or whose communication is permitted by policy.

[0073] 2. Graph Convolution and Attention Aggregation: Graph convolutional network layers are used to aggregate neighbor information. For nodes... Its features after one layer of graph convolution It can be represented as: ,in It is an activation function (such as ReLU). It is a node The neighborhood group, It is a node with a self-loop added. The degree, It is the first The layer's trainable weight matrix. To more finely fuse information from different modalities (traffic features and time-series log features), an attention mechanism is introduced during graph convolution. Specifically, in the aggregation of neighbors... When dealing with information, its weight It is based not only on node features, but also on temporal edge features. : ,in It is an attention vector. It is a transformation matrix. This indicates concatenation. In this way, the model can dynamically monitor connections with suspicious log activity.

[0074] 3. Cross-modal fusion and node-level representation: After... Layer (preferred) or After the graph convolutional attention network is applied, each node obtains a high-order feature representation. It has already integrated traffic and log information from multi-hop neighbors. To further integrate global topology information, a cross-modal fusion module is introduced. This module computes a topology context vector for each node. For example, this can be characterized by centrality metrics of nodes within the graph (such as degree centrality or eigenvector centrality). Ultimately, this results in a unified embedding representation of nodes. This is achieved through a gating fusion mechanism: ,in It is a gated vector. This indicates element-wise multiplication.

[0075] 4. Graph-level Classification and Threat Detection: Threat detection requires making judgments based on information from all nodes. A graph readout function aggregates the embeddings of all nodes. This yields a graph-level global representation. Commonly used readout functions include global average pooling or global max pooling. or Finally, Given a multilayer perceptron classifier (e.g., two fully connected layers with ReLU activation in between and Softmax at the output), output the probability distribution of various attacks (such as DDoS, port scanning, and lateral movement) that the network suffers within the given time window, and provide a comprehensive threat score.

[0076] Understandably, this modeling approach breaks down the barriers of isolated analysis of various data sources in traditional security detection. It encodes and infers interactions (edges) between network entities (nodes) and rich contextual information (time-series logs, topology) within a unified graph structure. Graph convolution and attention mechanisms enable the model to capture the paths and patterns of attacks propagating across the network (e.g., lateral movement from a compromised node to another through exploitation), which is crucial for detecting multiple attack phases of advanced persistent threats. The unified embedding representation provides comprehensive and structured situational awareness input for downstream detection and reinforcement learning decisions.

[0077] Specifically, the cross-modal feature fusion method further includes: using a graph autoencoder to perform unsupervised pre-training on multimodal graph data to learn latent feature representations; introducing a gating mechanism during the fusion process to dynamically adjust the contribution weights of each modality feature; extracting global graph-level features through graph pooling operations and combining them with local node features for multi-scale threat analysis; and utilizing a temporal graph neural network to process the sequence dependencies of log data, capturing the temporal evolution patterns of attack behavior, thereby improving the detection accuracy and context awareness of advanced persistent threats.

[0078] It should be further noted that, to further optimize the effect of cross-modal fusion and the depth of threat detection, this invention also includes the following refined technical means:

[0079] 1. Unsupervised Pre-training: In scenarios with limited labeled data, a graph autoencoder is used to pre-train on large-scale unlabeled multimodal graph data to learn a general network structure feature representation. Specifically, an encoder-decoder structure is constructed. The encoder part is similar to the aforementioned multi-layer graph convolutional network, which uses graph autoencoders to pre-train on large-scale unlabeled multimodal graph data to learn a general network structure feature representation. Encoding as node embedding The decoder attempts to... Adjacency matrix of the reconstructed graph and node feature matrix The reconstruction loss function is: ,in , It is the output of the decoder. It uses the Frobenius norm. By minimizing the reconstruction loss, the encoder learns low-dimensional embeddings that preserve the graph structure and key node features. The pre-trained encoder weights can be used as initialization parameters for downstream supervised threat detection tasks, improving the model's performance in small-sample scenarios.

[0080] 2. Dynamic Gated Fusion: Based on the characteristics of fused traffic nodes Log edge feature aggregation information and topological context At this time, a dynamic gating mechanism is adopted. The specific formula is: ,in These are three independent gate vectors, calculated as follows: And satisfy (Achieved through Softmax normalization). This allows the model to dynamically determine whether to rely more on traffic behavior, log alerts, or topology location information based on the specific context of the current node (e.g., whether the node is a database server or a user terminal).

[0081] 3. Multi-scale feature extraction: Hierarchical graph pooling operations (such as DiffPool) are used to progressively cluster semantically similar nodes in the graph into supernodes, forming graph representations of different granularities. Assuming that after pooling... A graph representation at various scales Each scale corresponds to a graph embedding. The final multi-scale global features are a concatenation of these embeddings: This multi-scale feature encompasses both subtle local anomalies and macroscopic global patterns, which helps detect attacks of different scales, ranging from single-point intrusions to network-wide spread.

[0082] 4. Temporal Dependency Modeling: For the temporal log feature sequences attached to the edges This can be processed using temporal graph neural networks, for example, by introducing recurrent units (such as GRU or LSTM) into the hidden state updates of each node, based on graph convolutional networks. At time step Features The updated formula is: Here, AGGREGATE can be the aforementioned attention aggregation function. This allows the model to capture the evolution of attack behavior over time, such as the complete chain from scanning to exploitation, and then to command and control communication.

[0083] Understandably, these enhanced fusion and analysis techniques enable the model to gain a more comprehensive and refined understanding of the cybersecurity landscape. Unsupervised pre-training fully leverages massive amounts of unlabeled data, alleviating the problem of scarce labeled data in the security field. Dynamic gating mechanisms endow the model with the ability to adaptively focus on key information. Multi-scale analysis takes into account both local and global threats. Temporal modeling enables the system to understand the attack lifecycle, which is particularly important for detecting advanced persistent threats with long latency periods and dispersed behaviors, greatly improving detection accuracy and the depth of awareness of attack context.

[0084] Specifically, the adversarial training mechanism in S3 includes: alternating between real network data and generated adversarial sample data when training the multimodal graph neural network; designing an adversarial loss function that combines classification loss and adversarial perturbation loss, and updating network parameters through gradient backpropagation; using the projected gradient descent method to generate adversarial perturbations to ensure that adversarial samples are within the effective attack range; and through iterative training, making the multimodal graph neural network insensitive to input perturbations while maintaining high detection accuracy, thereby enhancing the model's robustness against covert attacks and adversarial evasion techniques.

[0085] It should be further explained that adversarial training is a core step in improving model robustness, and its implementation includes the following detailed process:

[0086] 1. Alternating Training Strategy: Let the parameters of the multimodal graphical neural network model be... Its standard classification loss function is ,in The training data consists of real-labeled samples (including normal and attack samples). In each training iteration, the loss is first calculated and the parameters are updated using standard batches of real data. Then, the model parameters are fixed. Generate batches of adversarial examples for the current model Methods for generating adversarial examples employ gradient-based attack algorithms, such as the Fast Gradient Signed Method (FGSM) or its iterative variant (I-FGSM). Specifically, for a real sample... Its corresponding adversarial examples It can be generated by the following formula: ,in It is the loss function on the input gradient, It is a symbolic function. It is the perturbation magnitude, a hyperparameter, with an optimal value of [value missing]. or (Based on the range after feature normalization). This value was chosen to ensure the perturbation is sufficient to cause misjudgment by the model, but not large enough to significantly alter the semantics of the data (i.e., indistinguishable to the human eye or by simple rules). Then, this batch of adversarial examples was used. and their corresponding real tags (Because adversarial perturbations do not change the essential category of the sample) recalculate the classification loss. This alternating training forces the model to correctly classify the original sample as well as the slightly perturbed version within its neighborhood.

[0087] 2. Adversarial Loss Function Design: To more effectively conduct adversarial training, the loss function can be specifically designed to combine the standard classification loss and the adversarial regularization term: ,in It is data distribution. Add to input Disturbance on the surface yes norm (usually taken) or ),constraint Ensure that the disturbance is within acceptable limits. It is a trade-off coefficient, with the optimal value being... Inner layer This operation implies that we want the model's loss to be as small as possible even under the "worst-case" perturbation. In actual computation, the inner maximization problem is approximated using the aforementioned Projected Gradient Descent (PGD) attack. PGD is a type of I-FGSM that projects the perturbation back to the original value after each gradient update. Centered on Within a norm sphere of radius , to ensure that the perturbation satisfies the constraints. The iterative formula for PGD is: ,in It is the iteration step size (usually 1). or ), It is a projection operation, ensure .

[0088] 3. Training Process: The entire adversarial training process is a minimax optimization problem. ,in In actual training, this is achieved through two loops: the inner loop (attacker) uses algorithms such as PGD to find the "optimal" adversarial perturbation for each training sample under the current model; the outer loop (defender) uses these adversarial samples to update the model parameters to minimize the adversarial loss. This is typically done... Wheel (preferred) or PGD ​​iterations are used to generate strong adversarial examples for training.

[0089] Understandably, through this rigorous adversarial training mechanism, multimodal graph neural networks are forced to learn more robust feature representations that are invariant to small, malicious perturbations in the input data. This makes it difficult for attackers to fool the detection model with carefully crafted, imperceptible perturbations (e.g., slight changes in the time intervals or packet size distribution of attack traffic). Therefore, the model's resistance to covert attacks and adversarial samples designed to evade detection is significantly enhanced, enabling it to maintain more stable and high detection performance in real-world deployments.

[0090] Specifically, the design of the deep reinforcement learning agent in S4 includes: the agent's state space is defined as the threat detection output of the multimodal graph neural network, real-time network performance indicators, and asset topology changes; the action space includes a combination of actions such as dynamic isolation, decoy deployment, traffic rate limiting, policy routing adjustment, and alarm notification; the reward function is calculated based on a comprehensive consideration of threat mitigation speed, resource overhead, and false alarm rate; the agent uses a near-end policy optimization algorithm to update its policy, and continuously optimizes the policy network and value network by collecting experience replay data through interaction with the environment to achieve adaptive closed-loop defense.

[0091] It should be further explained that the deep reinforcement learning agent is the core of achieving adaptive decision-making, and its design details are as follows:

[0092] 1. State Space The state is a high-dimensional vector composed of three parts. The first part is the threat situation vector. The multimodal graphical neural network at time step The output includes: confidence scores for various types of attacks. (C represents the number of attack categories), highest threat level The encoding, and the most severely affected One-hot encoding of identifiers for M=5 assets (e.g., M=5). The second part is the performance metric vector. This includes: average CPU utilization of core servers, average memory utilization, bandwidth utilization of critical network links, and overall network latency. These metrics are acquired in real time by the monitoring system and normalized to the [0,1] interval. The third part is the topology change vector. A state vector is a binary vector indicating whether any new assets have been added or removed, or whether network policies (such as firewall rules) have changed within a past time window. The total dimension of the state vector is preferably between 100 and 200 to ensure that it contains enough information without becoming too large.

[0093] 2. Motion space The action space is designed as a combination of multi-dimensional discrete actions. For example, five independent action subspaces can be defined: .in, This indicates a dynamic isolation action, with possible values ​​of {No action, Isolate asset A, Isolate asset B, …} (based on the list of affected assets in the state). This indicates the decoy deployment action, with possible values ​​of {no deployment, deploy a low-interaction honeypot in subnet X, and deploy a high-interaction honeypot in subnet Y}. This indicates a traffic rate limiting action. Optional values ​​are {no rate limit, 50% rate limit for IP segment P, and 80% rate limit for IP segment Q}. This indicates a policy routing adjustment action. The possible values ​​are {no adjustment, redirecting traffic destined for asset Z to the scrubbing center}. This indicates the alarm notification action, with possible values ​​of {do not notify, notify junior administrator, notify security response team, trigger emergency response plan}. This combination of actions gives the agent a rich set of response strategies.

[0094] 3. Reward Function The reward function is crucial for guiding agent learning and must comprehensively consider both security and operational costs. A detailed reward function design is as follows: Safety Rewards Among them, TTM (Time To Mitigate) is the threat mitigation time reward, which is awarded if the agent's actions are subsequently... Within a time frame (e.g., 5 minutes), the threat confidence level drops by more than a threshold. (e.g., 50%), a positive reward of +2 is given; otherwise, 0 is given. Prevention is an attack prevention reward; if the action successfully prevents a verifiable real attack (e.g., confirmed through sandbox analysis), a high reward of +5 is given. Cost penalty. Here, FP (False Positive) is the penalty for false alarms. If the target of the action is ultimately verified as a false alarm (e.g., isolating a normal business server), a penalty of -3 is applied. Overhead is the resource cost penalty, calculated based on the action's consumption. For example, isolating a core server incurs a penalty of -2, deploying a high-interaction honeypot incurs a penalty of -0.5, and sending advanced alerts incurs a penalty of -0.1. (Coefficient) Used for trade-offs, the preferred value is... .choose Slightly higher to encourage active defense. A lower value indicates that resource consumption is a secondary consideration when it comes to security.

[0095] 4. Algorithm and Training: The agent employs a proximal policy optimization algorithm. It includes a policy network. (Actor) and a value network (Critic). The policy network outputs the probability distribution of each action given a state. The value network evaluates the value of the current state. During training, the agent runs in a simulated environment or a securely hardened real-world network test environment, collecting trajectory data. The data is then stored in an experience replay buffer. Batch data is periodically sampled from the buffer, and the advantage function estimate is calculated. The objective function of the PPO algorithm (used to update the policy network) is: ,in It is the probability ratio of the new strategy to the old strategy. This is a hyperparameter (usually set to 0.2) used to limit the magnitude of policy updates and ensure training stability. The update objective of the value network is to minimize the error between the estimated value and the actual reward. Through numerous interactive iterations, the agent learns which combination of defensive actions will maximize long-term cumulative rewards under different threat situations, network loads, and topology changes.

[0096] Understandably, this deep reinforcement learning agent design upgrades the defense system from static rule-based automation to a learning-based adaptive agent. The comprehensive definition of the state space enables it to perceive the complete cybersecurity situation; the rich combinatorial action space provides flexible and diverse response capabilities; the carefully designed reward function guides it to find the optimal balance between security effectiveness and operational costs; and the PPO algorithm ensures the stability and efficiency of policy learning. Ultimately, the system can achieve autonomous decision-making and continuous optimization in complex environments, forming a truly adaptive closed-loop defense.

[0097] Specifically, further optimization of the defense strategy adjustment actions includes: selecting the optimal defense combination based on the action probability distribution output by the deep reinforcement learning agent; dynamically isolating infected assets by modifying flow table rules in real time through a software-defined network controller; automatically generating and deploying a highly interactive honeypot system to mislead attackers and collect attack intelligence; introducing security policy consistency checks into the strategy adjustment process to ensure that defense actions do not disrupt normal network services; and continuously improving the strategy by inputting defense effectiveness evaluation as a reward signal into the agent through a feedback loop to adapt to the ever-changing threat environment.

[0098] It should be further explained that the automatic execution and optimization of the defense strategy is the implementation link of the closed-loop defense, which involves the following specific implementation details:

[0099] 1. Action Selection and Execution Optimization: The agent's policy network outputs the probability distribution of each possible action. During the exploration phase, random sampling based on probability can be used to explore new strategies; during the deployment phase, the action with the highest probability is usually selected. Alternatively, the action with the highest expected value can be selected. For combined actions, each sub-action can be selected independently, or a joint action space can be designed for overall selection. During execution, a central policy executor (Orchestrator) parses the abstract action instructions into concrete executable commands.

[0100] 2. Specific implementation of dynamic isolation: When the agent decides to isolate assets... At this time, the policy executor calls the northbound API of the software-defined network controller. The specific instruction is to issue one or more OpenFlow flow table rules to the controller. For example, the rule matching item is: source IP = The IP address is set to DROP (drop). Simultaneously, a matching destination IP will be sent. A DROP rule for the IP address can be used to achieve two-way isolation. To minimize impact, the priority and timeout of the rule can be set (e.g., automatically lift the isolation after 1 hour unless the proxy is renewed). Additionally, critical services can be migrated to a backup node before isolation.

[0101] 3. Specific Implementation of Decoy Deployment: The decoy deployment system maintains a honeypot image library that can be quickly instantiated. When the agent decides to deploy in a subnet... When deploying a high-interaction honeypot, the policy executor instructs the cloud management platform or container orchestration system (such as Kubernetes) on the subnet. A pre-configured honeypot instance (such as a Cowrie SSH honeypot or a Dionaea network service honeypot) is launched internally. Simultaneously, relevant routing or DNS settings are automatically modified to make the honeypot visible and attractive to attackers (e.g., setting its IP to a seemingly unused address). All attacker interaction data collected by the honeypot (login credentials, attack payloads, C2 server addresses, etc.) is fed back to the threat intelligence platform and analysis system in real time to enrich attack profiles and generate new adversarial samples.

[0102] 4. Security Policy Consistency Check: Before executing any defensive action, the policy executor invokes a policy consistency check module. This module maintains a network service dependency graph and a security policy baseline. For planned isolation actions, the check module simulates execution to determine if it would cause interruption of critical business services (e.g., isolating a database server leading to web service unavailability). If a conflict is detected, the check module can suggest a modified action (e.g., migrating the service before isolation) or return a negative reward signal to the agent to prevent the action from executing. This ensures the reliability of automated defense.

[0103] 5. Feedback Loop and Strategy Iteration: After defensive actions are executed, their effectiveness needs to be evaluated and fed back to the reinforcement learning agent. The effectiveness evaluation module monitors several key indicators: a) whether the target threat index has decreased; b) whether there is still malicious activity on the isolated / affected assets (confirmed through other detection methods); c) whether there are any false alarms (through manual review or automated verification processes). These evaluation results are quantified into reward signals. (As defined in the reward function above), and the new state This data is stored together in an experience replay buffer. The agent periodically uses this new experience data to update its policy and value networks, thereby learning optimal strategies for the latest network environment and threat landscape. This continuous feedback and learning enables the system to adapt to changes in network topology, updates to business applications, and the evolution of attack techniques.

[0104] Understandably, by refining and optimizing the execution chain of defensive actions, it is ensured that intelligent decisions can be accurately, reliably, and securely translated into actual network defense operations. Dynamic isolation and decoy deployment provide immediate and proactive defense measures. Policy consistency checks act as a safety valve to ensure business continuity, preventing secondary problems caused by automated defense. The final feedback loop is key to the system's "adaptive" capability, enabling defense strategies to learn from actual results, continuously evolve, and ultimately form an autonomous security defense system that becomes increasingly intelligent and accurate with use.

[0105] This invention provides another embodiment, which offers an adaptive network security monitoring system that integrates adversarial driving and cross-modal fusion. The adaptive network security monitoring system that integrates adversarial driving and cross-modal fusion includes:

[0106] (1) Adversarial generation and reinforcement learning collaborative module, which is used to dynamically generate adversarial samples of advanced persistent threats through adversarial generation network and combine reinforcement learning to simulate the attack response process.

[0107] It should be further noted that this system is an implementation of the aforementioned methods on physical or virtual computing devices. Each module can be deployed on a distributed server cluster, communicating via high-speed networks and message queues. The adversarial generation and reinforcement learning collaboration module contains a high-performance computing unit that runs the generative adversarial network model and the reinforcement learning simulation environment. It reads network baseline data from the data store, runs the generator to produce adversarial examples, and runs the reinforcement learning agent in a simulated network environment for policy game training. The output of this module is a continuously updated adversarial example library and a reinforcement learning agent policy model.

[0108] (2) Multimodal graph neural network modeling module, used to perform unified graph structure modeling on network traffic data, system log data and asset topology data, extract cross-modal features and perform threat detection.

[0109] It should be further explained that the multimodal graph neural network modeling module is a data processing and model inference engine. It connects to network traffic probes, log collectors, and asset management systems, receiving multi-source data streams in real time. The internal data preprocessing submodule is responsible for converting the raw data into a unified graph structure. The trained multimodal graph neural network model is loaded and run in this module, performing forward propagation on the input graph data and outputting threat detection results (including alert events, threat levels, affected assets, etc.) in real time.

[0110] (3) Adversarial training module, used to perform adversarial training on the multimodal graph neural network using the adversarial samples to enhance the robustness of the model against covert attacks.

[0111] It should be further explained that the adversarial training module is a model training platform. It allocates computing resources, loads multimodal graph neural network models and adversarial examples generated by the adversarial generation module, executes the aforementioned adversarial training algorithms (such as PGD adversarial training), and periodically updates and releases more robust threat detection models to the modeling module.

[0112] (4) Deep reinforcement learning agent module, used to automatically learn and execute defense strategy adjustment actions based on threat detection results, so as to realize automated closed-loop defense;

[0113] The modules work together to achieve adaptive network security monitoring through adversarial driving and cross-modal fusion.

[0114] It should be further explained that the deep reinforcement learning agent module is the system's decision-making and control center. It receives real-time threat detection results from the modeling module and combines them with performance metrics from the network monitoring system to form state awareness. It loads the trained policy network and outputs defensive action decisions (such as "isolate server X") based on the current state. This module translates abstract actions into concrete configuration commands for execution through policy execution interfaces (such as calling the SDN controller API or cloud platform API).

[0115] Furthermore, after system startup, each module initializes. The modeling module continuously performs threat detection. Initially, the reinforcement learning agent uses an initial strategy or a rule-based fallback strategy. The adversarial training module periodically retrains the detection model using new samples generated by the collaboration module to improve its robustness. The reinforcement learning agent in the collaboration module also continuously interacts and learns with the simulated environment. When the modeling module detects a high-confidence threat, it triggers the reinforcement learning agent module to make a decision and execute defensive actions. The effectiveness of the actions is monitored and fed back to the collaboration module and the agent module to update rewards and strategies. The entire system forms a complete closed loop of data collection, threat detection, intelligent decision-making, action execution, effect evaluation, and model optimization.

[0116] Understandably, this system organically integrates adversarial thinking, multimodal fusion technology, and reinforcement learning decision-making into a unified architecture. It is not merely a detection system, but rather an adaptive defense platform with self-evolving capabilities. Through modular design, the system possesses excellent scalability, allowing for easy integration of new data sources or deployment of new defense action executors. The collaborative work between modules enables the system to continuously learn and evolve from ongoing "attack-defense game" and "real-world feedback," thereby providing continuous, intelligent, and automated security protection capabilities in the face of increasingly complex and dynamically changing cyber threats.

[0117] In a preferred embodiment, this application also provides an electronic device, the electronic device comprising:

[0118] The computer device includes a memory and a processor, wherein the memory stores computer-readable instructions that, when executed by the processor, implement the adversarial-driven and cross-modal fusion adaptive network security monitoring method. The computer device can be broadly categorized as a server, terminal, or any other electronic device with the necessary computing and / or processing capabilities. In one embodiment, the computer device may include a processor, memory, network interface, communication interface, etc., connected via a system bus. The processor of the computer device can be used to provide the necessary computing, processing, and / or control capabilities. The memory of the computer device may include non-volatile storage media and internal memory. The non-volatile storage media may store an operating system, computer programs, etc. The internal memory can provide an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface and communication interface of the computer device can be used to connect and communicate with external devices via a network. When the computer program is executed by the processor, it performs the steps of the method of the present invention.

[0119] This invention can be implemented as a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, causes the steps of the methods of embodiments of the invention to be performed. In one embodiment, the computer program is distributed across multiple network-coupled computer devices or processors, such that the computer program is stored, accessed, and executed in a distributed manner by one or more computer devices or processors. A single method step / operation, or two or more method steps / operations, may be executed by a single computer device or processor or by two or more computer devices or processors. One or more method steps / operations may be executed by one or more computer devices or processors, and one or more other method steps / operations may be executed by one or more other computer devices or processors. One or more computer devices or processors may execute a single method step / operation, or execute two or more method steps / operations.

[0120] Those skilled in the art will understand that the method steps of this invention can be performed by a computer program instructing related hardware, such as a computer device or processor, to perform the steps of this invention when executed. Depending on the context, any references herein to memory, storage, databases, or other media may include non-volatile and / or volatile memory. Examples of non-volatile memory include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory, magnetic tape, floppy disk, magneto-optical data storage device, optical data storage device, hard disk, solid-state drive, etc. Examples of volatile memory include random access memory (RAM), external cache memory, etc.

[0121] The technical features described above can be combined arbitrarily. Although not all possible combinations of these technical features are described, any combination of these technical features should be considered to be covered by this specification, provided that such combination does not contain contradictions.

[0122] The specific embodiments of the present invention described above do not constitute a limitation on the scope of protection of the present invention. Any other corresponding changes and modifications made in accordance with the technical concept of the present invention should be included within the scope of protection of the claims of the present invention.

Claims

1. An adaptive network security monitoring method that integrates adversarial driving and cross-modal approaches, characterized in that, include: S1. Dynamically generate adversarial examples of advanced persistent threats and simulate attack responses based on response strategies through a collaborative framework of adversarial generative networks and reinforcement learning. S2. Utilize multimodal graph neural networks to uniformly model network traffic data, system log data, and asset topology data, and extract cross-modal features and perform threat detection through graph convolution operations; S3. Introduce an adversarial training mechanism and use the adversarial examples to perform adversarial training on the multimodal graph neural network to enhance the model's robustness against covert and mutation attacks. S4. Deploy a deep reinforcement learning agent to automatically learn and execute defense strategy adjustment actions based on the threat detection results, thereby achieving automated closed-loop defense from detection to strategy adjustment. The defense strategy adjustment actions include dynamically isolating affected assets and deploying decoy systems.

2. The method according to claim 1, characterized in that, The collaborative framework of adversarial generative networks and reinforcement learning in S1 specifically includes: The generator network receives random noise vectors and current network state features as input and outputs adversarial traffic samples simulating advanced persistent threats. The discriminator network distinguishes between real network traffic samples and the generated adversarial traffic samples and outputs the discrimination probability. The reinforcement learning agent interacts with the generator network and the discriminator network, using the adversarial traffic samples as a simulated attack environment, and learns the optimal response strategy through a state-action-reward loop. The response strategy includes real-time blocking of malicious connections, traffic redirection to a sandbox, or triggering an alarm escalation mechanism. The collaborative framework dynamically optimizes the adversarial sample generation and response strategy by alternately training the generator, discriminator, and reinforcement learning agent.

3. The method according to claim 2, characterized in that, The generator network employs a deep neural network architecture, including multiple fully connected layers and convolutional layers, to learn advanced attack pattern features from random noise and network state. The discriminator network includes a graph attention layer and a fully connected classification layer, which are used to distinguish samples by combining the output of the multimodal graph neural network. The reinforcement learning agent is designed based on a deep deterministic policy gradient algorithm. Its state space is jointly composed of the threat detection output of the multimodal graph neural network and the network topology state. Its action space includes a variety of defensive operations. The reward function is dynamically adjusted based on the attack mitigation effect and the false positive penalty to achieve continuous optimization of the policy.

4. The method according to claim 3, characterized in that, The process of generating adversarial examples further includes: Adversarial traffic samples output by the generator network are injected into real network traffic data through adversarial perturbations to form a hybrid training dataset. The discriminator network is used to evaluate the hybrid dataset to generate an adversarial loss signal. Combined with the reward signal of the reinforcement learning agent, the parameters of the generator network are optimized to generate more realistic and diverse attack samples, thereby improving the generalization ability of the multimodal graph neural network in adversarial training and enhancing its robustness in detecting unknown threats.

5. The method according to claim 1, characterized in that, The unified modeling of multimodal graphical neural networks in S2 specifically includes: Network traffic data is transformed into graph node features, system log data into time-series edge features, and asset topology data into graph structure connections. Node features are aggregated through graph convolutional network layers, and feature representations from different modalities are weighted and fused using a graph attention mechanism. A cross-modal fusion module is used to concatenate or weightedly sum the feature vectors of traffic, logs, and topology to generate a unified graph embedding representation. Based on this graph embedding representation, threat detection is performed using a multilayer perceptron classifier, outputting attack probability and type labels.

6. The method according to claim 5, characterized in that, The cross-modal feature fusion method further includes: Unsupervised pre-training of multimodal graph data is performed using a graph autoencoder to learn latent feature representations; a gating mechanism is introduced during the fusion process to dynamically adjust the contribution weights of each modality feature; global graph-level features are extracted through graph pooling operations and combined with local node features for multi-scale threat analysis; and temporal graph neural networks are used to process the sequence dependencies of log data to capture the temporal evolution patterns of attack behavior, thereby improving the detection accuracy and context awareness of advanced persistent threats.

7. The method according to claim 1, characterized in that, The adversarial training mechanism in S3 specifically includes: When training a multimodal graph neural network, real network data and generated adversarial example data are used alternately; an adversarial loss function is designed, combining classification loss and adversarial perturbation loss, and the network parameters are updated through gradient backpropagation; an adversarial perturbation method is used to generate adversarial perturbations to ensure that adversarial examples are within the effective attack range; through iterative training, the multimodal graph neural network becomes insensitive to input perturbations while maintaining high detection accuracy, thereby enhancing the model's robustness against covert attacks and adversarial evasion techniques.

8. The method according to claim 1, characterized in that, The design of the deep reinforcement learning agent in S4 includes: The agent's state space is defined as the threat detection output, real-time network performance metrics, and asset topology changes of the multimodal graph neural network; the action space includes a combination of actions such as dynamic isolation, decoy deployment, traffic rate limiting, policy routing adjustment, and alarm notification; the reward function is calculated based on a comprehensive consideration of threat mitigation speed, resource overhead, and false alarm rate; the agent uses a near-end policy optimization algorithm to update its policies, and continuously optimizes the policy network and value network by interacting with the environment to achieve adaptive closed-loop defense.

9. The method according to claim 8, characterized in that, Further optimization of the defense strategy adjustment actions includes: Based on the probability distribution of actions output by the deep reinforcement learning agent, the optimal defense combination is selected; dynamic isolation actions are implemented by modifying flow table rules in real time through a software-defined network controller to isolate infected assets; decoy deployment actions are automatically generated and deployed to a highly interactive honeypot system to mislead attackers and collect attack intelligence; a security policy consistency check is introduced during the policy adjustment process to ensure that defense actions do not disrupt normal network services; and the defense effectiveness evaluation is used as a reward signal to input into the agent through a feedback loop to continuously improve the policy to adapt to the ever-changing threat environment.

10. An adaptive network security monitoring system that integrates adversarial driving and cross-modal technologies, characterized in that, include: The adversarial generation and reinforcement learning collaborative module is used to dynamically generate adversarial samples of advanced persistent threats through adversarial generation networks and combine them with reinforcement learning to simulate the attack response process. The multimodal graph neural network modeling module is used to perform unified graph structure modeling on network traffic data, system log data, and asset topology data, extract cross-modal features, and perform threat detection. An adversarial training module is used to perform adversarial training on the multimodal graph neural network using the adversarial examples, thereby enhancing the model's robustness against covert attacks. The deep reinforcement learning agent module is used to automatically learn and execute defense strategy adjustment actions based on threat detection results, thereby achieving automated closed-loop defense. The modules work together to achieve adaptive network security monitoring through adversarial driving and cross-modal fusion.