An adaptive network attack prediction and defense method, device and readable storage medium

By employing multi-scale time windows and deep learning methods, combined with Markov decision processes, the problem of ignoring the characteristics of traffic data in network attack prediction and defense is solved, improving prediction accuracy and the reliability of defense strategies, and achieving effective defense against both short-term bursts and long-term attacks.

CN121984792BActive Publication Date: 2026-06-26SUZHOU UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SUZHOU UNIV
Filing Date
2026-04-07
Publication Date
2026-06-26

AI Technical Summary

Technical Problem

Existing technologies in network attack prediction and defense ignore the characteristics of network traffic data, such as multi-dimensional heterogeneity, strong non-stationarity, obvious burstiness, and irregular sequence length, which leads to attack prediction bias and affects the reliability of defense strategies.

Method used

We employ multi-scale time windows to slice the network data stream, combine deep learning methods for feature extraction and preliminary attack prediction, utilize Markov decision processes to obtain defense strategies, and optimize defense measures through convolutional neural networks and reinforcement learning frameworks.

Benefits of technology

It improves the accuracy of attack prediction and the reliability of defense strategies, can sensitively identify short-term burst attacks and long-term network attacks, reduces the impact of noise interference, and achieves more stable defense decisions.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121984792B_ABST
    Figure CN121984792B_ABST
Patent Text Reader

Abstract

The application belongs to the technical field of network security, and relates to a self-adaptive network attack prediction and defense method, device and readable storage medium. Time sequence is obtained by slicing network data flow, different scale time windows are used to sample forward with the current time as the end point to obtain a plurality of historical time sequence sampling sequences; feature extraction and attack prediction are performed on each historical time sequence sampling sequence to assign weights to each feature vector; attack risk probability interval and interval width are obtained based on each historical time sequence sampling sequence, feature vector, weight, and preset attack risk upper and lower bounds; based on the attack risk probability interval of the current and historical time, a first attack risk probability sequence, a second attack risk probability sequence, a third attack risk probability sequence, and an attack risk fluctuation sequence are obtained; a defense strategy acquisition process is described as a Markov decision process, a state space, an action space and a reward function are established, a deep learning framework is obtained, and the current time network attack defense measure is solved.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of network security technology, and in particular to an adaptive network attack prediction and defense method, apparatus, and computer-readable storage medium. Background Technology

[0002] With the rapid development of cloud computing, the Internet of Things (IoT), and the Industrial Internet, network systems are expanding in scale, and business models are becoming increasingly diversified and dynamic. At the same time, network attack methods are becoming increasingly complex, with attack behaviors exhibiting characteristics such as enhanced concealment, longer attack chains, more phased attack steps, and dynamic fluctuations in attack intensity. Typical attacks include Distributed Denial of Service (DDoS) attacks, scanning and probing, malware propagation, data theft, and Advanced Persistent Threat (APT) attacks.

[0003] To ensure network system security, existing technologies typically employ intrusion detection systems (IDS), intrusion prevention systems (IPS), firewalls, traffic scrubbing devices, and security information and event management systems (SIEM) to detect and protect against attacks. Common intrusion detection and prevention systems often use rule bases or signature matching to identify attacks. For example, they match network packets, session behavior, or log times using preset attack signature rules. When a rule is triggered, an alarm is output or a blocking operation (such as blocking IP addresses or ports) is performed. These solutions are simple to implement and offer high real-time performance, but they heavily rely on manually maintained rule bases, making it difficult to identify unknown or variant attacks. Furthermore, policy configurations are usually fixed, resulting in poor adaptability. To address this issue, deep learning methods have been used in recent years for network intrusion detection and attack prediction. By modeling the time-series characteristics of network traffic, abnormal behavior identification and attack probability assessment can be achieved. However, network traffic data is characterized by multi-dimensional heterogeneity, strong non-stationarity, obvious burstiness, and irregular sequence length. If time-series models are directly used to predict point values ​​and output attack categories or single attack risk scores, it is difficult to fully characterize the fluctuations in traffic characteristics and the uncertainty of prediction results. As a result, when traffic changes drastically under attack mutations or noise interference, the model is prone to prediction bias, which affects the reliability of subsequent defense strategies.

[0004] In summary, existing adaptive network attack prediction and defense methods neglect the characteristics of network traffic data, such as multi-dimensional heterogeneity, strong non-stationarity, obvious burstiness, and irregular sequence length. This can easily lead to attack prediction bias, thereby affecting the reliability of defense strategies. Summary of the Invention

[0005] Therefore, the technical problem to be solved by the present invention is to overcome the problem that the existing adaptive network attack prediction and defense methods ignore the characteristics of network traffic data, such as multi-dimensional heterogeneity, strong non-stationarity, obvious burstiness, and irregular sequence length, which easily leads to attack prediction deviations and thus affects the reliability of the defense strategy.

[0006] To address the aforementioned technical problems, this invention provides an adaptive network attack prediction and defense method, comprising:

[0007] The network data stream is sliced ​​at fixed time intervals to obtain time sequences with equal time intervals. Then, using time windows of different scales, samples are taken from the current time as the endpoint to obtain multiple historical time series sampling sequences of different lengths at the current time.

[0008] Feature extraction and preliminary attack prediction are performed on each historical time series sampling sequence at the current moment. Based on the preliminary attack prediction results, weights are assigned to the feature vectors of each historical time series sampling sequence.

[0009] Input the current historical time series sampling sequences, the feature vectors and weights of each historical time series sampling sequence, the preset upper bound of attack risk, and the preset lower bound of attack risk into the deep temporal coding network, and output the attack risk probability interval and the width of the attack risk probability interval at the current moment.

[0010] Based on the maximum attack risk probability at the current and historical moments, the first attack risk probability sequence at the current moment is obtained; based on the minimum attack risk probability at the current and historical moments, the second attack risk probability sequence at the current moment is obtained; based on the attack risk probability interval width at the current and historical moments, the third attack risk probability sequence at the current moment is obtained; based on the difference between the attack risk probability interval widths of every two adjacent moments in the current and historical moments, the attack risk fluctuation sequence at the current moment is obtained.

[0011] Based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, and the attack risk fluctuation sequence, the process of obtaining the defense strategy at the current moment is described as a Markov decision process. The state space, action space, and reward function are established to obtain a deep learning framework and solve for the network attack defense measures at the current moment.

[0012] Preferably, the process of obtaining time windows of different scales includes:

[0013] The first-scale time window is obtained based on the length of the time series, wherein the length of the first-scale time window is greater than or equal to 1 / 3 of the length of the time series and less than or equal to 1 / 2 of the length of the time series.

[0014] Using the first-scale time window, sampling forward from the current time as the endpoint, the first historical time series sampling sequence is obtained;

[0015] The mean and standard deviation of the first historical time series sampling sequence are calculated to obtain the coefficient of variation of the first historical time series sampling sequence. The coefficient of variation is used to scale the first scale time window to obtain the second scale time window.

[0016] The autocorrelation coefficient of the first historical time series sampling sequence is calculated using the autocorrelation function. Based on the autocorrelation coefficient, the first-scale time window is scaled to obtain the third-scale time window.

[0017] Preferably, feature extraction and preliminary attack prediction are performed on each historical time-series sampling sequence at the current moment, and weights are assigned to the feature vectors of each historical time-series sampling sequence based on the preliminary attack prediction results, including:

[0018] Each historical time-series sampling sequence is input into the first convolutional neural network for feature extraction, resulting in the feature vector of each historical time-series sampling sequence.

[0019] The feature vectors of each historical time series sampling sequence are input into the second convolutional neural network for preliminary attack prediction, and the attack risk probability and prediction confidence of each historical time series sampling sequence are obtained.

[0020] Based on the attack risk probability and the confidence of the prediction results, each historical time series sampling sequence is clustered to obtain at least two historical time series sampling sequence clusters;

[0021] Based on the mean attack risk probability and the mean prediction confidence of all historical time series sampled sequences in each historical time series sampled sequence cluster, the attack risk probability and prediction confidence of each historical time series sampled sequence cluster are obtained, which serve as the cluster center of that historical time series sampled sequence cluster.

[0022] Based on the attack risk probability and prediction confidence of each historical time-series sampled sequence in each historical time-series sampled sequence cluster, the distance from the historical time-series sampled sequence to the cluster center of the historical time-series sampled sequence cluster is calculated.

[0023] Based on the average distance from all historical time-series sampled sequences to the cluster center in each historical time-series sampled sequence cluster, the intra-class density of each historical time-series sampled sequence cluster is obtained;

[0024] All historical time series sampling sequence clusters are sorted in descending order of intra-class density, and weights are assigned to the feature vectors of historical time series sampling sequences in each historical time series sampling sequence cluster based on the sorting results.

[0025] Among them, the weight of the feature vector of the historical time-series sampled sequence in the first-ordered historical time-series sampled sequence cluster is greater than the weight of the feature vector of the historical time-series sampled sequence in the second-ordered historical time-series sampled sequence cluster.

[0026] Preferably, based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, and the attack risk fluctuation sequence, the current defense strategy acquisition process is described as a Markov decision process, establishing a state space, action space, and reward function, including:

[0027] The first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, the attack risk fluctuation sequence, the traffic characteristics of network data flow, the system resource status, and the attack frequency are defined as the state space.

[0028] The attack warning threshold, system resource scheduling, defense strategy, and attack warning level are defined as the action space;

[0029] A reward function is constructed based on the reduction in the width of the attack risk probability interval, the system resource scheduling cost, and the frequency of defense strategy switching.

[0030] Preferably, a reward function is constructed based on the reduction in the attack risk probability interval width, system resource scheduling costs, and defense strategy switching frequency, including:

[0031] The decrease in the maximum probability of attack risk at adjacent time points and the reduction in the width of the attack risk probability interval are used as positive reward items;

[0032] The system resource scheduling cost and the frequency of defense strategy switching are used as penalty factors;

[0033] The reward function is constructed based on the difference between the weighted sum of the positive reward terms and the weighted sum of the penalty terms.

[0034] Preferably, the network attack defense measures obtained by solving the deep reinforcement learning framework at the current moment include:

[0035] S11: Combine the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, the attack risk fluctuation sequence, the traffic characteristics of the network data stream, the system resource status, and the attack frequency at the current moment, and input them into the main Q network as the state vector of time slot t to obtain the action vector of time slot t output by the main Q network.

[0036] S12: Execute the action vector of time slot t, and calculate the reward function value of time slot t and the state vector of time slot t+1; store the state vector of time slot t, the action vector of time slot t, the reward function value of time slot t, and the state vector of time slot t+1 as a tuple into the experience replay pool;

[0037] S13: Update t=t+1 and return to step S11 until the number of tuples in the experience replay pool reaches the preset number;

[0038] S14: Randomly select a sample bag consisting of a preset number of tuples from the experience replay pool, and input each tuple in the sample bag into the main Q network and the target Q network respectively; calculate the value of the loss function based on the predicted Q value corresponding to each tuple output by the main Q network and the target Q value corresponding to each tuple output by the target Q network, and update the parameters of the main Q network by minimizing the value of the loss function.

[0039] S15: Return to step S11 to iteratively optimize the main Q network until the number of iterations reaches the preset number. Based on the action vector of time slot t output by the main Q network, obtain the network attack defense measures at the current moment.

[0040] Preferably, after calculating the reward function value for time slot t, the method further includes:

[0041] If the reward function value of time slot t is less than -1, then the reward function value of time slot t is set to -1;

[0042] If the reward function value of time slot t is greater than 1, then the reward function value of time slot t is set to 1.

[0043] Preferably, the state vector of time slot t Represented as:

[0044] ,

[0045] in, This represents the probability sequence of the first attack risk; This represents the probability sequence of the second attack risk; This represents a probability sequence of third-party attack risks. This indicates a sequence of attack risk fluctuations; Represents the traffic characteristics of network data streams; Indicates the status of system resources; Indicates the frequency of attacks;

[0046] Action vector of time slot t Represented as:

[0047] ,

[0048] in, Indicates the attack warning threshold; Indicates system resource scheduling; Indicates a defensive strategy; Indicates the attack warning level;

[0049] reward function value of time slot t The calculation formula is:

[0050] ,

[0051] in, This represents the decrease in the maximum probability of attack risk between adjacent time points; This indicates the reduction in the width of the attack risk probability interval between adjacent time points; Indicates the system resource scheduling cost; Indicates the frequency of switching defense strategies; express The weights; express The weights; express The weights; express The weight.

[0052] The present invention also provides an adaptive network attack prediction and defense device, comprising:

[0053] The multi-scale sampling module is used to slice the network data stream at fixed time intervals to obtain time sequences with equal time intervals. It then uses different scale time windows to sample forward from the current time to obtain multiple historical time sequence sampling sequences of different lengths at the current time.

[0054] The attack probability prediction and weight allocation module is used to extract features and make preliminary attack predictions for each historical time series sampling sequence at the current moment, and to allocate weights to the feature vectors of each historical time series sampling sequence based on the preliminary attack prediction results.

[0055] The attack risk probability interval prediction module is used to input the current historical time series sampling sequences, the feature vectors and weights of the current historical time series sampling sequences, the preset upper bound of attack risk, and the preset lower bound of attack risk into the deep temporal coding network, and output the attack risk probability interval and the width of the attack risk probability interval at the current moment.

[0056] The data sequence construction module is used to obtain the first attack risk probability sequence for the current moment based on the maximum attack risk probability of the current and historical moments; the second attack risk probability sequence for the current moment based on the minimum attack risk probability of the current and historical moments; the third attack risk probability sequence for the current moment based on the attack risk probability interval width of the current and historical moments; and the attack risk fluctuation sequence for the current moment based on the difference between the attack risk probability interval widths of every two adjacent moments in the current and historical moments.

[0057] The defense strategy acquisition module is used to describe the current defense strategy acquisition process as a Markov decision process based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, and the attack risk fluctuation sequence. It establishes the state space, action space, and reward function, obtains a deep learning framework, and solves for the network attack defense measures at the current moment.

[0058] The present invention also provides a computer-readable storage medium storing a computer program, which, when executed by a processor, implements the steps of the above-described adaptive network attack prediction and defense method.

[0059] The adaptive network attack prediction and defense method provided in this application has the following beneficial effects:

[0060] 1. This application designs multiple time windows of different scales to extract historical time-series sampling sequences of varying lengths from each moment. Small-scale time windows extract short-term data sequences from the current moment and short-term historical moments, while large-scale time windows extract long-term data sequences from the current moment and long-term historical moments. This approach can capture long-term attack trends, achieving high detection accuracy for long-term network attacks, and also more sensitively identify short-term burst attacks without introducing excessive historical data noise. Convolutional networks are used for feature extraction and preliminary prediction to obtain the attack risk probability and prediction confidence of each historical time-series sampling sequence, roughly reflecting whether each historical time-series sampling sequence contains attack information. This allows for the allocation of weights to the historical time-series sampling sequences, improving the attack detection accuracy of the deep temporal coding network output. Accuracy of risk probability range; Considering the characteristics of network attacks being sudden, dynamically evolving, and highly uncertain, and the temporal and correlated nature of defense decisions, the effectiveness of defense measures at the current moment will directly affect the attack status and defense resource consumption at the next moment. Therefore, this application describes the defense strategy acquisition process as a Markov decision process, transforming the attack risk prediction results into executable defense decisions. Through the three-dimensional linkage of state, action, and reward, the problem of the disconnect between attack prediction and defense is solved. It takes into account the characteristics of network traffic data being multi-dimensional, heterogeneous, highly non-stationary, obviously sudden, and irregular in sequence length, as well as the characteristics of defense decisions being sudden, dynamically evolving, and highly uncertain, thereby improving the accuracy of attack prediction and the reliability of defense strategies.

[0061] 2. By clustering, historical time-series sampling sequences with similar prediction results and similar confidence levels are grouped into one class, which is divided into four groups: containing attack information + reliable, containing attack information + unreliable, not containing attack information + reliable, and not containing attack information + unreliable. At the same time, by calculating the intra-cluster density of each cluster, it is determined whether the prediction results and confidence levels of the historical time-series sampling sequences within the cluster are compact, stable, and consistent. The higher the intra-cluster density, the more consistent and reliable the historical time-series sampling sequences are. Thus, higher weights are assigned to the historical time-series sampling sequences within the cluster, so that reliable sequences dominate the prediction of the deep time-series coding network, suppress the noise interference of unreliable sequences, and improve the accuracy of the attack risk probability interval output by the deep time-series coding network. Attached Figure Description

[0062] To make the content of this invention easier to understand, the invention will be further described in detail below with reference to specific embodiments and accompanying drawings, wherein:

[0063] Figure 1 Flowchart of the adaptive network attack prediction and defense method provided in this application;

[0064] Figure 2 This is a schematic diagram of the adaptive network attack prediction and defense device provided in this application. Detailed Implementation

[0065] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, so that those skilled in the art can better understand and implement the present invention. However, the embodiments described are not intended to limit the present invention.

[0066] Please see Figure 1 , Figure 1 The flowchart shown is of the adaptive network attack prediction and defense method provided in this application. The method specifically includes S10~S50:

[0067] S10: Slice the network data stream according to fixed time intervals to obtain time sequences with equal time intervals. Use time windows of different scales to sample forward from the current time to obtain multiple historical time series sampling sequences of different lengths at the current time.

[0068] S20: Perform feature extraction and preliminary attack prediction on each historical time series sampling sequence at the current moment, and assign weights to the feature vectors of each historical time series sampling sequence based on the preliminary attack prediction results.

[0069] S30: Input the current time-series sampling sequences, the feature vectors and weights of each historical time-series sampling sequence, the preset upper bound of attack risk, and the preset lower bound of attack risk into the deep temporal coding network, and output the attack risk probability interval and the width of the attack risk probability interval at the current time.

[0070] S40: Based on the maximum attack risk probability of the current and historical moments, obtain the first attack risk probability sequence of the current moment; based on the minimum attack risk probability of the current and historical moments, obtain the second attack risk probability sequence of the current moment; based on the attack risk probability interval width of the current and historical moments, obtain the third attack risk probability sequence of the current moment; based on the difference between the attack risk probability interval widths of every two adjacent moments in the current and historical moments, obtain the attack risk fluctuation sequence of the current moment.

[0071] Specifically, in step S30, based on each historical time-series sampling sequence and its feature vector and weight, the preset upper bound of attack risk, and the preset lower bound of attack risk, the attack risk probability interval and its interval width at each time are obtained; in step S40, the first attack risk probability sequence at the current time is composed of the maximum value of the attack risk probability interval at the current time and the historical time (i.e., the maximum attack risk probability), the second attack risk probability sequence at the current time is composed of the minimum value of the attack risk probability interval at the current time and the historical time (i.e., the minimum attack risk probability), and the third attack risk probability interval at the current time is composed of the interval width of the attack risk probability interval at the current time and the historical time.

[0072] For example, the attack risk probability interval at time t is: ,in, This represents the minimum value of the attack risk probability interval at time t, i.e., the minimum attack risk probability at time t. This represents the maximum value of the attack risk probability interval at time t, i.e., the maximum attack risk probability at time t; the width of the attack risk probability interval at time t. Then the probability sequence of the first attack risk at time t. The second attack risk probability sequence at time t The probability sequence of the third attack risk at time t .

[0073] S50: Based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, and the attack risk fluctuation sequence, the current moment's defense strategy acquisition process is described as a Markov decision process. The state space, action space, and reward function are established to obtain a deep learning framework and solve for the network attack defense measures at the current moment.

[0074] Specifically, network data streams are characterized by multi-dimensional heterogeneity, strong non-stationarity, significant burstiness, and irregular sequence lengths. Meanwhile, network attacks exhibit multi-scale, multi-period, and simultaneous burstiness and persistence. For example, DDoS attacks manifest as short-term, drastic mutations, with anomalies appearing only within a few moments, while slow penetration attacks show a long-term, gradual increase, requiring a longer historical time series for detection. Therefore, in step S10, this application designs multiple time windows of different scales to extract historical time series sampling sequences of varying lengths from each moment. Small-scale time windows extract short-term data sequences from the current moment and short-term historical moments, while large-scale time windows extract long-term data sequences from the current moment and long-term historical moments. This approach not only captures long-term attack trends and improves the detection accuracy of long-term network attacks but also more sensitively identifies short-term burst attacks without introducing excessive historical data noise.

[0075] Furthermore, the process of obtaining time windows at different scales includes steps 1-1 to 1-4:

[0076] Step 1-1: Obtain the first scale time window based on the length of the time series, wherein the length of the first scale time window is greater than or equal to 1 / 3 of the length of the time series and less than or equal to 1 / 2 of the length of the time series.

[0077] Step 1-2: Using the first scale time window, sample forward from the current time to obtain the first historical time series sampling sequence.

[0078] Steps 1-3: Calculate the mean and standard deviation of the first historical time series sampling sequence to obtain the coefficient of variation of the first historical time series sampling sequence. Use the coefficient of variation to scale the first scale time window to obtain the second scale time window.

[0079] Steps 1-4: Calculate the autocorrelation coefficient of the first historical time series sampling sequence using the autocorrelation function, and scale the first-scale time window based on the autocorrelation coefficient to obtain the third-scale time window.

[0080] When designing the time window scale, firstly, since the length of the time series directly reflects the overall time span of the traffic, the scale of the first-scale time window is constrained to 1 / 3 to 1 / 2 of the time series length, matching the period of the traffic itself. This ensures sufficient effective historical information while avoiding information redundancy caused by an excessively long window. Secondly, for the first historical sampled sequence captured by the first-scale time window, its coefficient of variation is calculated to reflect the fluctuation intensity. The first-scale time window is then scaled using the coefficient of variation to obtain the second-scale time window. Specifically, a larger coefficient of variation indicates severe fluctuations in the first historical sampled sequence, meaning the current sequence may be a precursor to or during an attack. Therefore, it is necessary to increase the scale time window to collect data over a longer range, thereby more accurately detecting attacks based on data mutations and their duration over a longer range. A smaller autocorrelation coefficient indicates that the first historical sampling sequence has small fluctuations, meaning the current sequence is close to normal flow data, and only a small amount of historical data needs to be collected within a small time window. Finally, by calculating the autocorrelation of the first historical sampling sequence, the first-scale time window is scaled to obtain the third-scale time window. Specifically, a larger autocorrelation indicates that the data within the sequence is highly correlated, and historical information remains valid, meaning the current sequence has strong temporal dependence, and the trends and changes in historical data still have a strong impact on new data. Therefore, the time window scale can be increased to make full use of information from a longer historical period. A smaller autocorrelation indicates that the data within the sequence has less dependence, meaning that historical data expires quickly and has no continuous pattern. Long-term historical data does not have much impact on the current data, so the time window scale can be reduced to retain only short-term historical information.

[0081] Specifically, step S20 includes S200~S206:

[0082] S200: Input each historical time-series sampling sequence into the first convolutional neural network for feature extraction to obtain the feature vector of each historical time-series sampling sequence.

[0083] S201: Input the feature vectors of each historical time-series sampling sequence into the second convolutional neural network for preliminary attack prediction, and obtain the attack risk probability and prediction confidence of each historical time-series sampling sequence.

[0084] S202: Cluster each historical time series sampling sequence based on the attack risk probability and the confidence of the prediction results to obtain at least two historical time series sampling sequence clusters.

[0085] S203: Based on the mean attack risk probability and the mean prediction confidence of all historical time-series sampled sequences in each historical time-series sampled sequence cluster, the attack risk probability and prediction confidence of each historical time-series sampled sequence cluster are obtained, and this is used as the cluster center of that historical time-series sampled sequence cluster.

[0086] S204: Based on the attack risk probability and prediction confidence of each historical time-series sampled sequence in each historical time-series sampled sequence cluster, calculate the distance from the historical time-series sampled sequence to the cluster center of the historical time-series sampled sequence cluster.

[0087] S205: Based on the average distance from all historical time-series sampled sequences to the cluster center in each historical time-series sampled sequence cluster, obtain the intra-class density of each historical time-series sampled sequence cluster.

[0088] S206: Sort all historical time series sampled sequence clusters in descending order of intra-class density, and assign weights to the feature vectors of historical time series sampled sequences in each historical time series sampled sequence cluster based on the sorting results.

[0089] Among them, the weight of the feature vector of the historical time-series sampled sequence in the first-ordered historical time-series sampled sequence cluster is greater than the weight of the feature vector of the historical time-series sampled sequence in the second-ordered historical time-series sampled sequence cluster.

[0090] For example, each historical time-series sampling sequence cluster , , Let represent the attack risk probability and prediction confidence of the m-th historical time-series sampling sequence in the cluster, respectively;

[0091] Cluster center is , , ;

[0092] Mean distance from all historical time-series sampled sequences to the cluster center in a historical time-series sampled sequence cluster Intra-class density of historical time-series sampled sequence clusters .

[0093] It should be noted that, since different historical time-series sampling sequences contain different attack information, in order to determine which sequences are more effective, this application first uses a convolutional network for feature extraction and preliminary prediction to obtain the attack risk probability and prediction confidence of each historical time-series sampling sequence, thereby roughly reflecting whether each historical time-series sampling sequence contains attack information. Then, through clustering, historical time-series sampling sequences with similar prediction results and similar confidence are grouped into one category, which is divided into groups containing attack information + reliable, containing attack information + unreliable, not containing attack information + reliable, and not containing attack information + unreliable. At the same time, by calculating the intra-cluster density of each cluster, it is determined whether the prediction results and confidence of the historical time-series sampling sequences within the cluster are compact, stable, and consistent. The higher the intra-cluster density, the more consistent and reliable the historical time-series sampling sequences are, thus assigning higher weights to the historical time-series sampling sequences within the cluster, so that reliable sequences dominate the prediction of the deep time-series coding network, suppress the noise interference of unreliable sequences, and improve the accuracy of the attack risk probability range output by the deep time-series coding network.

[0094] Furthermore, considering the characteristics of network attacks being sudden, dynamically evolving, and highly uncertain, and the temporal and correlated nature of defense decisions, the effectiveness of defense measures at the current moment will directly affect the attack status and defense resource consumption at the next moment. Therefore, this application describes the defense strategy acquisition process as a Markov decision process, transforming the attack risk prediction results into executable defense decisions, and solving the problem of the disconnect between attack prediction and defense through the three-dimensional linkage of state, action, and reward.

[0095] Specifically, based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, and the attack risk fluctuation sequence, the current defense strategy acquisition process is described as a Markov decision process, establishing a state space, action space, and reward function, including steps 2-1 to 2-3:

[0096] Step 2-1: Define the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, the attack risk fluctuation sequence, the traffic characteristics of network data flow, the system resource status, and the attack frequency as the state space.

[0097] Step 2-2: Define the attack warning threshold, system resource scheduling, defense strategy, and attack warning level as the action space.

[0098] For example, system resource scheduling includes adjusting the allocation ratio of CPU computing resources, allocating and limiting network bandwidth resources, dynamically adjusting memory usage thresholds, and adjusting the concurrency of defense computing tasks. Defense strategies include enhanced monitoring, traffic rate limiting, blocking suspicious IPs, connection isolation, traffic scrubbing, and enhanced access authentication.

[0099] Steps 2-3: Construct a reward function based on the reduction in the attack risk probability interval width, system resource scheduling costs, and defense strategy switching frequency.

[0100] Specifically, the state space defined in this application covers all information on attack risk probability prediction and basic environmental information, including both the temporal trend of attack risk changes and the constraints of defense decisions, thus completely and accurately representing the current attack situation and system defense capabilities. The action space covers the three core defense links of early warning, resource scheduling, and strategy, forming a complete defense action system to ensure that defense measures can comprehensively cope with different network attacks. Finally, the defense effect and defense cost are quantified into a calculable reward function value, so that the optimal defense strategy at the current moment can be found in the iterative process.

[0101] Furthermore, step 2-3 includes steps 2-3-1 to 2-3-3:

[0102] Step 2-3-1: The decrease in the maximum attack risk probability at adjacent time points and the reduction in the width of the attack risk probability interval are used as positive reward items.

[0103] Step 2-3-2: Use system resource scheduling costs and defense strategy switching frequency as penalty items.

[0104] Step 2-3-3: Construct the reward function based on the difference between the weighted sum of the positive reward terms and the weighted sum of the penalty terms.

[0105] Specifically, the positive reward items set in this application directly correspond to the improvement effect of attack risk. A decrease in the maximum probability of attack risk means a reduction in the highest threat level of the attack, and a reduction in the width of the attack risk probability range means a reduction in attack risk fluctuations and more stable prediction results. Since system resources are limited, excessively high resource scheduling costs will lead to a decline in system performance and affect normal business operations. Furthermore, frequent switching of defense strategies will lead to system state disorder, which will not only increase system overhead but may also lead to defense vulnerabilities. Therefore, this application uses resource scheduling costs and the frequency of defense strategy switching as penalty items to avoid over-defense, maintain the stability of defense strategies as much as possible, improve the reliability and continuity of defense, and minimize resource costs.

[0106] Specifically, the state vector of time slot t Represented as:

[0107] ,

[0108] in, This represents the probability sequence of the first attack risk; This represents the probability sequence of the second attack risk; This represents a probability sequence of third-party attack risks. This indicates a sequence of attack risk fluctuations; Represents the traffic characteristics of network data streams; Indicates the status of system resources; Indicates the frequency of attacks;

[0109] Action vector of time slot t Represented as:

[0110] ,

[0111] in, Indicates the attack warning threshold; Indicates system resource scheduling; Indicates a defensive strategy; Indicates the attack warning level;

[0112] reward function value of time slot t The calculation formula is:

[0113] ,

[0114] in, This represents the decrease in the maximum probability of attack risk between adjacent time points; This indicates the reduction in the width of the attack risk probability interval between adjacent time points; Indicates the system resource scheduling cost; Indicates the frequency of switching defense strategies; express The weights; express The weights; express The weights; express The weight.

[0115] Furthermore, the network attack defense measures at the current moment are obtained by solving the deep reinforcement learning framework, including S11~S15:

[0116] S11: Combine the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, the attack risk fluctuation sequence, the traffic characteristics of the network data stream, the system resource status, and the attack frequency at the current moment, and input them into the main Q network as the state vector of time slot t to obtain the action vector of time slot t output by the main Q network.

[0117] S12: Execute the action vector of time slot t, and calculate the reward function value of time slot t and the state vector of time slot t+1; store the state vector of time slot t, the action vector of time slot t, the reward function value of time slot t, and the state vector of time slot t+1 as a tuple into the experience replay pool.

[0118] S13: Update t=t+1 and return to step S11 until the number of tuples in the experience replay pool reaches the preset number.

[0119] S14: Randomly select a sample bag consisting of a preset number of tuples from the experience replay pool, and input each tuple in the sample bag into the main Q network and the target Q network respectively; calculate the value of the loss function based on the predicted Q value corresponding to each tuple output by the main Q network and the target Q value corresponding to each tuple output by the target Q network, and update the parameters of the main Q network by minimizing the value of the loss function.

[0120] S15: Return to step S11 to iteratively optimize the main Q network until the number of iterations reaches the preset number. Based on the action vector of time slot t output by the main Q network, obtain the network attack defense measures at the current moment.

[0121] Optionally, after calculating the reward function value for time slot t, the following may also be included:

[0122] If the reward function value of time slot t is less than -1, then the reward function value of time slot t is set to -1;

[0123] If the reward function value of time slot t is greater than 1, then the reward function value of time slot t is set to 1.

[0124] Specifically, in the process of network attack defense, if there is a very large positive reward, such as a sudden attack being quickly defended, or a small positive reward but a very large negative penalty, such as excessive resource scheduling costs or excessively high policy switching frequency, the loss function of the Q network will increase sharply, causing gradient explosion and leading to abnormal network parameter updates. Therefore, this application imposes a hard truncation constraint on the reward function value, limiting it to the range of [-1,1], thereby avoiding the problem of numerical instability and network parameter update failure caused by excessively large or small reward values.

[0125] Based on the adaptive network attack prediction and defense method provided in the above embodiments, this application also provides an adaptive network attack prediction and defense device, such as... Figure 2 As shown, the device specifically includes:

[0126] The multi-scale sampling module 10 is used to slice the network data stream according to a fixed time interval to obtain a time sequence with equal time intervals. It uses different scale time windows to sample forward from the current time as the endpoint to obtain multiple historical time sequence sampling sequences of different lengths at the current time.

[0127] The attack probability prediction and weight allocation module 20 is used to extract features and make preliminary attack predictions for each historical time series sampling sequence at the current moment, and to allocate weights to the feature vectors of each historical time series sampling sequence based on the preliminary attack prediction results.

[0128] The attack risk probability interval prediction module 30 is used to input the current historical time series sampling sequences, the feature vectors and weights of the current historical time series sampling sequences, the preset attack risk upper bound, and the preset attack risk lower bound into the deep time series coding network, and output the attack risk probability interval and the width of the attack risk probability interval at the current moment.

[0129] The data sequence construction module 40 is used to obtain the first attack risk probability sequence at the current moment based on the maximum attack risk probability at the current and historical moments; to obtain the second attack risk probability sequence at the current moment based on the minimum attack risk probability at the current and historical moments; to obtain the third attack risk probability sequence at the current moment based on the attack risk probability interval width at the current and historical moments; and to obtain the attack risk fluctuation sequence at the current moment based on the difference between the attack risk probability interval widths of every two adjacent moments in the current and historical moments.

[0130] The defense strategy acquisition module 50 is used to describe the current defense strategy acquisition process as a Markov decision process based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, and the attack risk fluctuation sequence. It establishes the state space, action space, and reward function, obtains a deep learning framework, and solves for the network attack defense measures at the current moment.

[0131] This application also provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the adaptive network attack prediction and defense method described above.

[0132] Those skilled in the art will understand that embodiments of this application can be provided as methods, systems, or computer program products. Therefore, this application can take the form of a completely hardware embodiment, a completely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, this application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.

[0133] This application is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this application. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, generate instructions for implementing the flowchart... Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.

[0134] These computer program instructions may also be stored in a computer-readable storage medium that can direct a computer or other programmable data processing device to function in a particular manner, such that the instructions stored in the computer-readable storage medium produce an article of manufacture including instruction means, which are implemented in a process Figure 1 One or more processes and / or boxes Figure 1 The function specified in one or more boxes.

[0135] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.

[0136] Obviously, the above embodiments are merely illustrative examples for clear explanation and are not intended to limit the implementation. Those skilled in the art will recognize that other variations or modifications can be made based on the above description. It is neither necessary nor possible to exhaustively list all possible implementations here. However, obvious variations or modifications derived therefrom are still within the scope of protection of this invention.

Claims

1. An adaptive network attack prediction and defense method, characterized in that, include: The network data stream is sliced ​​at fixed time intervals to obtain time sequences with equal time intervals. Then, using time windows of different scales, samples are taken from the current time as the endpoint to obtain multiple historical time series sampling sequences of different lengths at the current time. Feature extraction and preliminary attack prediction are performed on each historical time series sampling sequence at the current moment. Based on the preliminary attack prediction results, weights are assigned to the feature vectors of each historical time series sampling sequence. Input the current historical time series sampling sequences, the feature vectors and weights of each historical time series sampling sequence, the preset upper bound of attack risk, and the preset lower bound of attack risk into the deep temporal coding network, and output the attack risk probability interval and the width of the attack risk probability interval at the current moment. Based on the maximum attack risk probability at the current and historical moments, the first attack risk probability sequence at the current moment is obtained; based on the minimum attack risk probability at the current and historical moments, the second attack risk probability sequence at the current moment is obtained. Based on the attack risk probability interval width at the current and historical moments, the third attack risk probability sequence at the current moment is obtained; Based on the difference in the attack risk probability interval width between every two adjacent moments in the current and historical moments, the attack risk fluctuation sequence at the current moment is obtained. Based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, and the attack risk fluctuation sequence, the process of obtaining the defense strategy at the current moment is described as a Markov decision process. The state space, action space, and reward function are established to obtain a deep learning framework and solve for the network attack defense measures at the current moment.

2. The adaptive network attack prediction and defense method according to claim 1, characterized in that, The process of obtaining time windows at different scales includes: The first-scale time window is obtained based on the length of the time series, wherein the length of the first-scale time window is greater than or equal to 1 / 3 of the length of the time series and less than or equal to 1 / 2 of the length of the time series. Using the first-scale time window, sampling forward from the current time as the endpoint, the first historical time series sampling sequence is obtained; The mean and standard deviation of the first historical time series sampling sequence are calculated to obtain the coefficient of variation of the first historical time series sampling sequence. The coefficient of variation is used to scale the first scale time window to obtain the second scale time window. The autocorrelation coefficient of the first historical time series sampling sequence is calculated using the autocorrelation function. Based on the autocorrelation coefficient, the first-scale time window is scaled to obtain the third-scale time window.

3. The adaptive network attack prediction and defense method according to claim 1, characterized in that, Feature extraction and preliminary attack prediction are performed on each historical time-series sample sequence at the current moment. Based on the preliminary attack prediction results, weights are assigned to the feature vectors of each historical time-series sample sequence, including: Each historical time-series sampling sequence is input into the first convolutional neural network for feature extraction, resulting in the feature vector of each historical time-series sampling sequence. The feature vectors of each historical time series sampling sequence are input into the second convolutional neural network for preliminary attack prediction, and the attack risk probability and prediction confidence of each historical time series sampling sequence are obtained. Based on the attack risk probability and the confidence of the prediction results, each historical time series sampling sequence is clustered to obtain at least two historical time series sampling sequence clusters; Based on the mean attack risk probability and the mean prediction confidence of all historical time series sampled sequences in each historical time series sampled sequence cluster, the attack risk probability and prediction confidence of each historical time series sampled sequence cluster are obtained, which serve as the cluster center of that historical time series sampled sequence cluster. Based on the attack risk probability and prediction confidence of each historical time-series sampled sequence in each historical time-series sampled sequence cluster, the distance from the historical time-series sampled sequence to the cluster center of the historical time-series sampled sequence cluster is calculated. Based on the average distance from all historical time-series sampled sequences to the cluster center in each historical time-series sampled sequence cluster, the intra-class density of each historical time-series sampled sequence cluster is obtained; All historical time series sampling sequence clusters are sorted in descending order of intra-class density, and weights are assigned to the feature vectors of historical time series sampling sequences in each historical time series sampling sequence cluster based on the sorting results. Among them, the weight of the feature vector of the historical time-series sampled sequence in the first-ordered historical time-series sampled sequence cluster is greater than the weight of the feature vector of the historical time-series sampled sequence in the second-ordered historical time-series sampled sequence cluster.

4. The adaptive network attack prediction and defense method according to claim 1, characterized in that, Based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, and the attack risk fluctuation sequence, the current defense strategy acquisition process is described as a Markov decision process, establishing the state space, action space, and reward function, including: The first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, the attack risk fluctuation sequence, the traffic characteristics of network data flow, the system resource status, and the attack frequency are defined as the state space. The attack warning threshold, system resource scheduling, defense strategy, and attack warning level are defined as the action space; A reward function is constructed based on the reduction in the width of the attack risk probability interval, the system resource scheduling cost, and the frequency of defense strategy switching.

5. The adaptive network attack prediction and defense method according to claim 4, characterized in that, Based on the reduction in the attack risk probability interval, system resource scheduling costs, and the frequency of defense strategy switching, a reward function is constructed, including: The decrease in the maximum probability of attack risk at adjacent time points and the reduction in the width of the attack risk probability interval are used as positive reward items; The system resource scheduling cost and the frequency of defense strategy switching are used as penalty factors; The reward function is constructed based on the difference between the weighted sum of the positive reward terms and the weighted sum of the penalty terms.

6. The adaptive network attack prediction and defense method according to claim 5, characterized in that, Solving the network attack defense measures at the current time using a deep reinforcement learning framework yields the following: S11: Combine the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, the attack risk fluctuation sequence, the traffic characteristics of the network data stream, the system resource status, and the attack frequency at the current moment, and input them into the main Q network as the state vector of time slot t to obtain the action vector of time slot t output by the main Q network. S12: Execute the action vector of time slot t, and calculate the reward function value of time slot t and the state vector of time slot t+1; store the state vector of time slot t, the action vector of time slot t, the reward function value of time slot t, and the state vector of time slot t+1 as a tuple into the experience replay pool; S13: Update t=t+1 and return to step S11 until the number of tuples in the experience replay pool reaches the preset number; S14: Randomly select a sample bag consisting of a preset number of tuples from the experience replay pool, and input each tuple in the sample bag into the main Q network and the target Q network respectively; calculate the value of the loss function based on the predicted Q value corresponding to each tuple output by the main Q network and the target Q value corresponding to each tuple output by the target Q network, and update the parameters of the main Q network by minimizing the value of the loss function. S15: Return to step S11 to iteratively optimize the main Q network until the number of iterations reaches the preset number. Based on the action vector of time slot t output by the main Q network, obtain the network attack defense measures at the current moment.

7. The adaptive network attack prediction and defense method according to claim 6, characterized in that, After calculating the reward function value for time slot t, the following is also included: If the reward function value of time slot t is less than -1, then the reward function value of time slot t is set to -1; If the reward function value of time slot t is greater than 1, then the reward function value of time slot t is set to 1.

8. The adaptive network attack prediction and defense method according to claim 6, characterized in that, State vector of time slot t Represented as: , in, This represents the probability sequence of the first attack risk; This represents the probability sequence of the second attack risk; This represents a probability sequence of third-party attack risks. This indicates a sequence of attack risk fluctuations; Represents the traffic characteristics of network data streams; Indicates the status of system resources; Indicates the frequency of attacks; Action vector of time slot t Represented as: , in, Indicates the attack warning threshold; Indicates system resource scheduling; Indicates a defensive strategy; Indicates the attack warning level; reward function value of time slot t The calculation formula is: , in, This represents the decrease in the maximum probability of attack risk between adjacent time points; This indicates the reduction in the width of the attack risk probability interval between adjacent time points; Indicates the system resource scheduling cost; Indicates the frequency of switching defense strategies; express The weights; express The weights; express The weights; express The weight.

9. An adaptive network attack prediction and defense device, characterized in that, include: The multi-scale sampling module is used to slice the network data stream at fixed time intervals to obtain time sequences with equal time intervals. It then uses different scale time windows to sample forward from the current time to obtain multiple historical time sequence sampling sequences of different lengths at the current time. The attack probability prediction and weight allocation module is used to extract features and make preliminary attack predictions for each historical time series sampling sequence at the current moment, and to allocate weights to the feature vectors of each historical time series sampling sequence based on the preliminary attack prediction results. The attack risk probability interval prediction module is used to input the current historical time series sampling sequences, the feature vectors and weights of the current historical time series sampling sequences, the preset upper bound of attack risk, and the preset lower bound of attack risk into the deep temporal coding network, and output the attack risk probability interval and the width of the attack risk probability interval at the current moment. The data sequence construction module is used to obtain the first attack risk probability sequence at the current moment based on the maximum attack risk probability at the current and historical moments, and to obtain the second attack risk probability sequence at the current moment based on the minimum attack risk probability at the current and historical moments. Based on the attack risk probability interval widths of the current and historical moments, the third attack risk probability sequence of the current moment is obtained; based on the difference between the attack risk probability interval widths of every two adjacent moments in the current and historical moments, the attack risk fluctuation sequence of the current moment is obtained. The defense strategy acquisition module is used to describe the current defense strategy acquisition process as a Markov decision process based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, and the attack risk fluctuation sequence. It establishes the state space, action space, and reward function, obtains a deep learning framework, and solves for the network attack defense measures at the current moment.

10. A computer-readable storage medium, characterized in that, The computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the adaptive network attack prediction and defense method according to any one of claims 1 to 8.