Deep learning based network intrusion detection method
By introducing DDPM to generate high-quality and diverse network traffic data and combining it with an improved residual block and channel attention mechanism, the problem of low detection rate of deep learning methods in network intrusion detection is solved, and efficient and accurate network intrusion detection is achieved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- ZHEJIANG SCI-TECH UNIV
- Filing Date
- 2026-03-30
- Publication Date
- 2026-06-19
AI Technical Summary
Existing deep learning methods for network intrusion detection suffer from problems such as low detection rate of minority attack traffic, lack of diversity and authenticity of generated samples, unstable training, and high-performance hardware requirements. In particular, they are difficult to improve the accuracy and generalization ability of detection models when network traffic data is imbalanced.
A network intrusion detection method based on the Denoising Diffusion Probability Model (DDPM) is adopted. High-quality and diverse network traffic data are generated through category label embedding and non-uniform step size scheduling mechanism. The feature capture capability is improved by combining the improved residual block and channel attention mechanism.
It significantly improves the accuracy and generalization ability of the detection model, enhances the ability to identify minority class attacks, reduces the number of iterations, and improves data generation and detection efficiency.
Smart Images

Figure CN122247700A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of network intrusion detection technology, and in particular relates to a network intrusion detection method based on deep learning. Background Technology
[0003] In existing technologies, traditional machine learning-based intrusion detection methods have been widely used, but they have limitations when handling large-scale, high-dimensional, and complex network data. In recent years, deep learning methods have been introduced into the field of intrusion detection, with typical methods including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs). These methods can automatically extract features and improve detection accuracy. However, existing deep learning methods still suffer from the problem that normal samples far outnumber abnormal samples in network traffic, and the models are often biased towards the majority class during training, resulting in a low detection rate for minority class attack traffic.
[0004] Existing methods mostly address this by modifying loss functions or sampling techniques, but these can easily lead to training instability or distorted sample distribution. While GANs and their variants can generate synthetic data, they are prone to pattern collapse or convergence difficulties during training, resulting in a lack of diversity and realism in the generated samples, thus affecting the generalization ability of the detection model. Some improved methods require complex optimization algorithms or high-performance hardware support, making it difficult to meet the application requirements of real-time network security detection.
[0005] In recent years, Denoising Diffusion Probability Modeling (DDPM) has gradually become a cutting-edge technology developing alongside Generative Artificial Intelligence (GAN). DDPM models the data distribution through progressive denoising, enabling the stable generation of high-quality synthetic samples and effectively avoiding the pattern collapse problem common in GANs. Related research has shown that DDPM has demonstrated superior generative performance in areas such as image generation, fault diagnosis, medical imaging, and natural language processing. However, the application of DDPM in network security intrusion detection is still in the exploratory stage. How to utilize DDPM to address the imbalance problem in network traffic data and improve the accuracy and generalization ability of detection models while maintaining the authenticity of original features remains an unsolved technical problem. Summary of the Invention
[0006] To address the aforementioned problems in existing technologies, this invention provides a network intrusion detection method based on deep learning and a Denoising Diffusion Probability Model (DDPM) (step 2 details the specific construction and implementation of the DDPM model), and applies this method to a network intrusion detection system. This method can effectively improve the quality and diversity of generated samples even when the number of class samples is imbalanced, thereby enhancing the application value of data in network security detection and traffic modeling.
[0007] The present invention adopts the following technical solution:
[0008] The specific steps of the deep learning-based network intrusion detection method are as follows:
[0009] Step 1: Preprocess the raw data and then divide it into training set, validation set and test set;
[0010] Step 2: Input the preprocessed training set data into the data generation model for processing. The resulting training set data contains high-dimensional feature representations of both local structural information and global dependencies.
[0011] Step 3: After merging the data generated in Step 2 and the training set data in Step 1, input them into the intrusion detection model for training. Then, use the validation set to adjust the hyperparameters and optimize the model to obtain an intrusion detection model that meets the optimization criteria. Use this intrusion detection model to detect network intrusions and obtain the network attack type identification results.
[0012] Preferably, in step 1, 70% of the data samples are used as the training set, 10% as the validation set, and 20% as the test set.
[0013] Preferably, in step 1, the preprocessing is as follows: the numerical features of the data are mapped to a standard normal distribution through Gaussian quantile transformation, the processed features are then concatenated to form a unified data matrix, and deduplication is performed.
[0014] Preferably, in step 2, before inputting the preprocessed training set data into the data generation model, the preprocessed training set data is first subjected to forward diffusion and backward diffusion. During the forward diffusion process, a linear or nonlinear noise scheduling function is set, and the noise intensity parameter at each step is denoted as β. t This process gradually destroys the original data information during T iterations, making it approximate an isotropic Gaussian distribution; the probability distribution is expressed by the formula:
[0015]
[0016] Where N represents a Gaussian distribution, x t Let x represent the sample state obtained from diffusion at step t. t-1 The sample from the previous step; I represents the unit covariance matrix;
[0017] During the reverse diffusion process, the input consists of noise sample x. t The back diffusion consists of the category label embedding vector y and the time step code t; the formula for back diffusion is expressed as:
[0018]
[0019] in, These represent the mean and variance predicted by the neural network, respectively.
[0020] As a preferred approach, the reverse diffusion process divides the time into an early stage (70%) and a later stage (30%). The step size in the early stage is designed to be larger than the set value, and a logarithmic sampling strategy is adopted to enable the data generation model to remove most of the noise more quickly and achieve a rough sample profile. The step size in the later stage is gradually reduced, and linear sampling is adopted to facilitate more accurate recovery of data details in the fine stage.
[0021] Preferably, in step 2, the triplet input is the noise sample x. t The category label embedding vector y and the time step encoding t are concatenated into a unified feature space and fed into the data generation model. In this model, the data first passes through a one-dimensional convolutional layer, then a max-pooling layer, followed by batch normalization, and finally a self-attention layer. The attention calculation formula is as follows:
[0022]
[0023] Where Q is the query, K is the key, V is the value, and d is the key. k This represents the dimension of the key vector.
[0024] The softmax function is used to normalize the attention score, transforming it into a probability distribution, as shown in the following formula:
[0025]
[0026] Where e is the base of the natural logarithm, n represents the dimension of the input vector, and x... i This represents the i-th element in the input vector;
[0027] Finally, an activation function is used to enhance the nonlinear expressive power, as shown in the following formula:
[0028]
[0029] Here, x represents the input value of a certain layer in the neural network.
[0030] Preferably, in the intrusion detection model of step 3, the input data first enters the first feature extraction unit, which includes the first residual block, a max pooling layer, and a channel attention mechanism. The channel attention first performs global average pooling on the feature map output by the residual block to extract the global information of each channel, obtaining the channel description vector Z. Then, it generates the attention weights for each channel through a two-layer fully connected network. The first fully connected layer performs dimensionality reduction and compression on Z and activates it with ReLU. The second fully connected layer restores the channel dimension and activates it with Sigmoid, obtaining the attention coefficients s for each channel.
[0031]
[0032] Where w1 and w2 are the weights of the fully connected layer, σ is the sigmoid activation function, and the attention coefficient for each channel is generated using the following formula:
[0033]
[0034] Where 'a' represents the output of the second fully connected layer;
[0035] Finally, the channel attention coefficients are multiplied channel by channel with the feature map output by the residual block to achieve feature recalibration;
[0036] The data then passes through the second and third residual feature extraction units in sequence. Each residual feature extraction unit consists of residual blocks and channel attention mechanisms to gradually extract deep features. Finally, the feature map output by the third residual feature extraction unit is flattened and input into the fully connected layer, and the final attack type identification result is output through classification.
[0037] Compared with the prior art, the present invention has the following significant technical advancements:
[0038] This invention proposes a network intrusion detection method based on a diffusion model, which can effectively improve detection efficiency. By introducing category label embedding, the model can conditionally control different attack types during the generation phase, thereby generating high-quality network traffic data that conforms to the actual attack distribution. Furthermore, a non-uniform step-size scheduling mechanism is used in the reverse diffusion process to effectively reduce the number of iterations while ensuring generation quality, thus improving data generation efficiency. Simultaneously, an improved residual block and channel attention mechanism is adopted in the model detection part, significantly improving the model's ability to capture key features and the accuracy of data generation. Attached Figure Description
[0039] Figure 1 This is a system block diagram related to a preferred embodiment of the network intrusion detection method of the present invention.
[0040] Figure 2This is a flowchart of the intrusion detection model according to a preferred embodiment of the present invention.
[0041] Figure 3 This is a flowchart of a preferred embodiment of a network intrusion detection method of the present invention.
[0042] Figure 4 This is a flowchart of the data generation model of a preferred embodiment of the present invention.
[0043] Figure 5 This is a visualization of the comparison between the present invention and existing technologies. Detailed Implementation
[0044] To provide a clearer understanding of the technical solution of the present invention, the present invention will now be described in detail with reference to specific embodiments.
[0045] In the technical solution of this invention, a network intrusion detection method based on a diffusion model is adopted, and its implementation flowchart is attached below. Figure 1 , 3 As shown, where x T In the T in the classification, there are T time steps, and in the final classification, there are N different classification results.
[0046] The system implemented in this embodiment is run in a Python environment with TensorFlow 2.6 installed and managed by Conda. Data visualization is performed using the Matplotlib library. The hardware configuration includes a Windows 11 operating system, an NVIDIA RTX 4060 Ti graphics card with 8 GB of video memory, an Intel Core i5-13490F processor (2.50 GHz), and 32 GB of RAM.
[0047] Specifically, this embodiment presents a network intrusion detection method based on deep learning, with the following specific steps:
[0048] Step 1: Data Partitioning and Processing. 70% of the data samples are used as the training set, 10% as the validation set, and 20% as the test set. The training set is used for data generation and training, the validation set is used to adjust hyperparameters, optimize the model, and monitor overfitting during training, and the test set is used to verify classification performance. The original data contains continuous features, which need to be normalized to suit deep learning models. Numerical features are mapped to a standard normal distribution using Gaussian quantile transformation to reduce the impact of outliers on training. The processed features are then concatenated to form a unified data matrix, and deduplication is performed to ensure the integrity and uniqueness of the dataset. Stratified sampling is used when partitioning the training and test sets to ensure that the proportion of each class is consistent in both sets, thus avoiding detection model bias caused by class imbalance. Finally, the indices of the training and test sets are reset for easy batch reading and model input. This preprocessing workflow provides a standardized and reliable data foundation for model training.
[0049] Step 2: Feed the processed training set data into the data generation model ( Figure 3 Before the "model" in the box, the preprocessed training set data undergoes forward and backward diffusion. First, the data is input into the forward diffusion module for stepwise noise addition. This process uses a linear or non-linear noise scheduling function, denoted as β for each step's noise intensity parameter. t This process gradually destroys the original data information during T iterations, making it approximate an isotropic Gaussian distribution. Its probability distribution can be expressed by the formula:
[0050]
[0051] Where N represents a Gaussian distribution, x t Let x represent the sample state obtained from diffusion at step t. t-1 The sample is from the previous step; I represents the unit covariance matrix, which ensures the independence and isotropic nature of the noise in each dimension.
[0052] During the reverse diffusion process, the input consists of noise sample x. t The model consists of a class label embedding vector y and a time step encoding t. The class label embedding is fused with the features of noisy samples as conditional information, enabling the model to explicitly distinguish different classes during the generation phase, thus allowing for targeted supplementation of smaller sample classes. The goal of backdiffusion is to gradually restore the original data distribution from the noise distribution; the basic formula can be expressed as:
[0053]
[0054] in, These represent the mean and variance predicted by the neural network, respectively.
[0055] To further improve generation speed and quality, this embodiment employs non-uniform scheduling. The core idea of the reverse diffusion process in terms of time scheduling is to divide the time into 70% for the early stage and 30% for the later stage. The early stage has a larger step size and uses a logarithmic sampling strategy, allowing the data generation model to remove most of the noise more quickly and achieve a rough sample outline. The later stage has a gradually decreasing step size and uses linear sampling, which facilitates more accurate recovery of data details in the fine-tuning stage.
[0056] The above triplet input (i.e., noise sample x) t The category label embedding vector y and the time step encoding t are first concatenated into a unified feature space and then fed into... Figure 4 The data generation model first uses a one-dimensional convolutional layer with a kernel size of 1x3, 64 output channels, and a stride of 4 to extract local temporal features. Then, it performs downsampling through a max-pooling layer with a pooling window of 3 and a stride of 4 to compress redundant information. Batch normalization is then applied to improve the stability of the feature distribution. Finally, a self-attention layer is introduced to capture global features. The basic idea is to map the query (Q), key (K), and value (V) to a single output, where the query, key, value, and output are all vectors. The attention calculation formula is as follows:
[0057]
[0058] Where, d k This represents the dimension of the key vector.
[0059] The softmax function is used to normalize the attention score, transforming it into a probability distribution, as shown in the following formula:
[0060]
[0061] Where e is the base of the natural logarithm, n represents the dimension of the input vector, and x... i This represents the i-th element in the input vector. The denominator is the sum of all exponent terms. It ensures that the output result is in the interval (0,1) and that the sum of all output values is 1.
[0062] Finally, we employ activation functions to enhance the nonlinear expressive power. We choose ReLU as the activation function, and its formula is as follows:
[0063]
[0064] Here, x represents the input value of a certain layer in the neural network, that is, the weighted input, which is usually the result of the previous layer's output after a linear transformation.
[0065] After this series of operations, the data generation model obtains a high-dimensional feature representation that simultaneously contains local structural information and global dependencies, which is used to predict the mean and variance of the numerical features at the current time step.
[0066] Step 3: After merging the generated data and the original training set data, the processed training set is fed into the detection model for parameter training, and the validation set is used for hyperparameter tuning and model optimization. This step corresponds to... Figure 3 In the "Detection Model" section, the three small cubes within the box represent three stacked residual feature extraction units, as shown in the specific structure below. Figure 2 As shown.
[0067] The input data first enters the first feature extraction unit, corresponding to Figure 3 The first small cube in the detection model contains the first residual block. This block extracts local features from a convolutional layer (kernel size 3x3, output channels 64, stride 4). After batch normalization and ReLU activation, a residual connection is introduced to directly add the input features to the convolutional output. Following the first residual block, a max-pooling layer is added to reduce computational overhead. Then, a channel attention mechanism is implemented. Channel attention first performs global average pooling on the feature map output from the residual block to extract global information for each channel, obtaining the channel description vector z. Then, a two-layer fully connected network generates attention weights for each channel. The first fully connected layer performs dimensionality reduction compression on Z (compression ratio 16) and then ReLU activation; the second fully connected layer restores the channel dimension and then Sigmoid activation, resulting in attention coefficients s for each channel.
[0068]
[0069] Where Z represents the channel features after global average pooling, w1 and w2 are the weights of the fully connected layer, σ is the sigmoid activation function, and the attention coefficient for each channel is generated using the following formula:
[0070]
[0071] Where 'a' represents the output of the second fully connected layer.
[0072] Finally, the channel attention coefficients are multiplied channel by channel by the feature map output by the residual block to achieve feature recalibration.
[0073] The data then sequentially passes through the second and third residual feature extraction units, which consist of residual blocks and channel attention mechanisms, and no longer include pooling layers, progressively extracting deep features. By stacking these three units, the original input information can be preserved while improving the network's expressive power. Finally, the feature map output from the third unit is flattened and input into a fully connected layer, where it is classified to output the final attack type identification result.
[0074] After the detection model is trained, it is evaluated using the independently partitioned test set from step 1. The test set is used only during the inference phase and does not participate in data generation or model training. It is used to measure the generalization performance of the final intrusion detection model on unseen samples, thereby verifying the effectiveness of the data augmentation strategy and classification model design.
[0075] To verify the effectiveness of the method of the present invention, an application example is given here.
[0076] Experiments were conducted on the CICIDS2017 dataset, which consists of network traffic data collected over five days. The first day contains only benign traffic, serving as a baseline for normal network behavior. In contrast, the following four days cover various types of network attacks, such as Brute Force FTP, Brute Force SSH, DoS, Heartbleed, WebAttack, Infiltration, Botnet, and DDoS. Each data instance consists of 78 feature attributes and one label attribute, which categorizes the sample as either normal traffic or belonging to a specific attack type.
[0077] First, the minority class samples were expanded, and the comparison of the data before and after the expansion is shown in Appendix Table 1.
[0078] Table 1 Comparison of minority sample size before and after expansion.
[0079]
[0080] To qualitatively evaluate the quality of generated adversarial examples, this invention employs the t-SNE visualization method, mapping real and generated data to a low-dimensional space for comparative analysis of their distribution characteristics. Data samples generated by each method are then backfilled into the original CICIDS2017 dataset, and the data distribution before and after backfilling is compared and analyzed. The relevant visualization results are shown below. Figure 5 As shown. In Figure 5As can be seen, the data generation model of this invention achieves clear boundaries in many categories, including BENIGN, PortScan, and DoSHulk, and exhibits strong continuity in category distribution. The generated data closely matches the original data in density, without obvious over-concentration or sparse regions. Geometric consistency and clustering trends are well preserved, demonstrating excellent distribution alignment and structural fidelity.
[0081] On the CICIDS2017 dataset, this invention was compared with traditional oversampling algorithms (SMOTE, ADASYN) and deep learning-based generative models (CGAN, CVAE), and the results are shown in Appendix Table 2.
[0082] The accuracies of SMOTE and ADASYN were 0.9844 and 0.9850, respectively, with F1 scores of 0.9856 and 0.9864. In terms of recall, SMOTE was only 0.9475, ADASYN was 0.9675, while this invention achieved a high 0.9982. This indicates that traditional methods may introduce noise or overlap when generating samples, leading to a higher false negative rate. While CGAN performed reasonably well in recall, its low precision resulted in a low F1 score compared to other methods, indicating a higher number of false positives. CVAE outperformed CGAN overall, with an accuracy of 0.9879 and an F1 score of 0.9882, but still significantly lower than this invention's 0.9956. This invention achieved optimal values across all metrics, maintaining both the highest recall and precision. This demonstrates that the data generated by this invention is of higher quality, covering more attack samples while reducing false positives on legitimate traffic.
[0083] Table 2 Comparison of Results
[0084]
[0085] Detailed F1 scores for different methods are shown in Appendix Table 3. This invention achieved good classification results in most categories. For the BENIGN category, which has the largest traffic share, this invention achieved an F1 score of 0.9973, superior to CVAE's 0.9930 and other comparative methods. In denial-of-service attacks such as DDoS and DoS Hulk, this invention achieved scores of 0.9987 and 0.9892 respectively, comparable to the ADASYN and CGAN models. Furthermore, this invention also demonstrated performance advantages in identifying long-tailed distributions and sparse samples. For example, in the SSH-Patator category, compared to SMOTE's 0.6633 and CVAE's 0.7265, this invention achieved a score of 0.9723. In addition, for categories such as Bot and DoS Slowhttptest, this invention also achieved relatively good results with scores of 0.5612 and 0.9301 respectively, superior to the CGAN model, which exhibits more volatile performance. In summary, it can be seen that the synthetic samples generated by this invention help improve the feature distribution of minority classes, thereby enhancing the overall recognition performance of the model across different categories.
[0086] Table 3 Comparison of F1 scores
[0087]
[0088] In summary, this invention balances generation quality, computational efficiency, and deployability, providing an efficient and scalable solution for the field of network intrusion detection technology. In practical applications, this invention enhances the system's ability to identify a few types of attacks, improves network defense levels, and demonstrates significant advantages in ensuring data security and reducing operational costs. Furthermore, this invention exhibits good adaptability and scalability, allowing for flexible application in network environments of varying sizes and complexities, and seamless integration with existing security architectures. Its widespread application will not only provide solid technical support for intelligent network protection but also drive the development of network security systems towards greater intelligence, efficiency, and reliability, offering innovative solutions to address increasingly severe network security challenges.
[0089] The above description is merely a detailed explanation of preferred embodiments and principles of the present invention. For those skilled in the art, there may be changes in specific implementation methods based on the ideas provided by the present invention, and these changes should also be considered within the scope of protection of the present invention.
Claims
1. A deep learning-based network intrusion detection method, characterized by: The specific steps are as follows: Step 1: First, divide the original data into training set, validation set and test set according to the set ratio, and then preprocess each dataset separately; Step 2: Input the preprocessed training set data into the data generation model for processing. The resulting training set data contains high-dimensional feature representations of both local structural information and global dependencies. Step 3: After merging the data generated in Step 2 and the training set data in Step 1, input them into the intrusion detection model for training. Then, use the validation set to adjust the hyperparameters and optimize the model to obtain an intrusion detection model that meets the optimization criteria. Use this intrusion detection model to detect network intrusions and obtain the network attack type identification results.
2. The deep learning-based network intrusion detection method as described in claim 1, characterized in that, In step 1, 70% of the data samples are used as the training set, 10% as the validation set, and 20% as the test set.
3. The deep learning-based network intrusion detection method as described in claim 1 or 2, characterized in that, In step 1, the preprocessing is as follows: the numerical features of the data are mapped to a standard normal distribution through Gaussian quantile transformation, the processed features are then concatenated to form a unified data matrix, and deduplication is performed.
4. The deep learning-based network intrusion detection method as described in claim 1, characterized in that, In step 2, before inputting the preprocessed training data into the data generation model, the preprocessed training data is first subjected to forward and backward diffusion. During the forward diffusion process, a linear or nonlinear noise scheduling function is set, and the noise intensity parameter at each step is denoted as β. t This process gradually destroys the original data information during T iterations, making it approximate an isotropic Gaussian distribution; the probability distribution is expressed by the formula: Where N represents a Gaussian distribution, x t Let x represent the sample state obtained from diffusion at step t. t-1 The sample from the previous step; I represents the unit covariance matrix; During the reverse diffusion process, the input consists of noise sample x. t The back diffusion consists of the category label embedding vector y and the time step code t; the formula for back diffusion is expressed as: in, These represent the mean and variance predicted by the neural network, respectively.
5. The deep learning-based network intrusion detection method as described in claim 4, characterized in that, In the reverse diffusion process, the time is divided into an early stage (70%) and a later stage (30%). The step size in the early stage is designed to be larger than the set value, and a logarithmic sampling strategy is adopted to enable the data generation model to remove most of the noise more quickly and achieve a rough sample profile. In the later stage, the step size is gradually reduced and linear sampling is adopted to facilitate more accurate recovery of data details in the fine stage.
6. The deep learning-based network intrusion detection method as described in claim 4 or 5, characterized in that, In step 2, the triplet input is the noise sample x. t The category label embedding vector y and the time step encoding t are concatenated into a unified feature space and fed into the data generation model. In this model, the data first passes through a one-dimensional convolutional layer, then a max-pooling layer, followed by batch normalization, and finally a self-attention layer. The attention calculation formula is as follows: Where Q is the query, K is the key, V is the value, and d is the key. k This represents the dimension of the key vector. The softmax function is used to normalize the attention score, transforming it into a probability distribution, as shown in the following formula: Where e is the base of the natural logarithm, n represents the dimension of the input vector, and x... i This represents the i-th element in the input vector; Finally, an activation function is used to enhance the nonlinear expressive power, as shown in the following formula: Here, x represents the input value of a certain layer in the neural network.
7. The deep learning-based network intrusion detection method as described in claim 1, characterized in that, In the intrusion detection model of step 3, the input data first enters the first feature extraction unit, which includes the first residual block, a max pooling layer, and a channel attention mechanism. The channel attention mechanism first performs global average pooling on the feature map output by the residual block to extract global information for each channel, obtaining the channel description vector Z. Then, it generates the attention weights for each channel through a two-layer fully connected network. The first fully connected layer performs dimensionality reduction and compression on Z and activates it with ReLU. The second fully connected layer restores the channel dimension and activates it with Sigmoid, obtaining the attention coefficients s for each channel. Where w1 and w2 are the weights of the fully connected layer, σ is the sigmoid activation function, and the attention coefficient for each channel is generated using the following formula: Where 'a' represents the output of the second fully connected layer; Finally, the channel attention coefficients are multiplied channel by channel with the feature map output by the residual block to achieve feature recalibration; The data then passes through the second and third residual feature extraction units in sequence. Each residual feature extraction unit consists of residual blocks and channel attention mechanisms to gradually extract deep features. Finally, the feature map output by the third residual feature extraction unit is flattened and input into the fully connected layer, and the final attack type identification result is output through classification.