Deep learning based network behavior adaptive feature extraction and detection method
By generating high-dimensional implicit feature vectors through deep learning and combining them with feature effectiveness evaluation, the problem of insufficient feature extraction quality evaluation in network behavior detection is solved, and the linkage optimization of feature extraction and anomaly detection is realized, thereby improving the detection effect.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI LIUYOU INFORMATION TECH CO LTD
- Filing Date
- 2026-03-26
- Publication Date
- 2026-06-19
AI Technical Summary
In existing technologies, network behavior detection lacks an effective feature extraction quality assessment mechanism and cannot adaptively adjust, resulting in insufficient targeting and adaptability of feature extraction, which affects the detection effect.
We employ a deep learning-based adaptive feature extraction method for network behavior. This method generates high-dimensional implicit feature vectors through an unsupervised convolutional neural network. By combining a feature validity evaluator and a behavior classification model, we achieve real-time verification and adaptive optimization of feature quality and establish a linkage mechanism between feature extraction and anomaly detection.
This improves the targeting and adaptability of feature extraction, reduces interference from invalid and redundant features, and enhances the accuracy and efficiency of abnormal behavior identification, forming a closed-loop optimization.
Smart Images

Figure CN122247693A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of network security detection technology, and in particular to a method for adaptive feature extraction and detection of network behavior based on deep learning. Background Technology
[0002] Network behavior detection is an important means of ensuring network security. In existing technologies, raw traffic data packets in the target network environment are usually collected first. After simple data cleaning, structured data is obtained. Then, features are extracted from the data by manually designing features or using simple machine learning models. Finally, the extracted features are input into a classification model to achieve abnormal behavior recognition.
[0003] In existing technical solutions, the feature extraction process lacks an effective quality assessment mechanism, making it impossible to determine whether the extracted features accurately reflect the core attributes of network behavior. This results in some invalid or redundant features entering subsequent detection stages, affecting the detection effect. At the same time, network behavior features are dynamic and changing. In existing methods, the feature extraction parameters are fixed and cannot be adaptively adjusted according to feature quality and detection results. Furthermore, after anomaly detection, only abnormal behavior can be identified, and key features related to the anomaly cannot be fed back into the feature extraction process, resulting in insufficient targeting and adaptability of feature extraction.
[0004] Existing technologies struggle to accurately assess the quality of feature extraction and adaptively optimize parameters, and they cannot establish a linkage optimization mechanism between feature extraction and anomaly detection. As a result, the accuracy and efficiency of network behavior detection cannot meet the needs of complex network environments. Summary of the Invention
[0005] The purpose of this invention is to address the shortcomings of existing technologies by proposing a deep learning-based adaptive feature extraction and detection method for network behavior.
[0006] To achieve the above objectives, the present invention adopts the following technical solution: a deep learning-based network behavior adaptive feature extraction and detection method, comprising: Collect raw traffic data packets in the target network environment, clean the raw traffic data packets, and obtain a structured network behavior base table; The network behavior base table is input into an unsupervised convolutional neural network, which performs deep encoding on the multidimensional attributes in the network behavior base table to automatically generate a set of high-dimensional implicit feature vectors. Construct a feature effectiveness evaluator, which receives the high-dimensional implicit feature vector and calculates a feature quality score based on the information entropy of the feature distribution and the inter-class scatter. When the feature quality score is lower than the preset qualified threshold, the network layer depth and convolution kernel parameters of the unsupervised convolutional neural network are adjusted, and the process of generating the high-dimensional implicit feature vector from the network behavior base table is re-executed until the feature quality score meets the qualified threshold. The high-dimensional implicit feature vector that meets the qualified threshold is input into the pre-trained behavior classification model for inference to identify network behavior records with abnormal attributes. The network behavior records with abnormal attributes are traced by feature tracing, the key feature dimensions that lead to the abnormal behavior are extracted, and the key feature dimensions are fed back to the unsupervised convolutional neural network as constraints for the next round of feature extraction.
[0007] As a further aspect of the present invention, the original traffic data packets are cleaned to obtain a structured network behavior base table, including: The original traffic data packet contains the source address, destination address, protocol type, and payload; The header information of the original traffic data packets is parsed, and the data packets are classified according to the transport layer protocol type. Session streams belonging to Hypertext Transfer Protocol, File Transfer Protocol and Simple Mail Transfer Protocol are extracted respectively. For each type of session flow, the time interval sequence of its data packets is calculated, and the time interval sequence is denoised based on the network reference delay to filter out abnormal time slices caused by network jitter. The denoised session stream is reassembled to merge scattered short connections into long connection sessions. The number of bytes, packets, and retransmissions for each long connection session are counted to form the structured network behavior base table.
[0008] As a further aspect of the present invention, the network behavior base table is input into an unsupervised convolutional neural network, which performs deep encoding on the multidimensional attributes in the network behavior base table to automatically generate a set of high-dimensional implicit feature vectors, including: The high-dimensional implicit feature vectors characterize the potential patterns of network behavior in time and space; The structured network behavior base table is converted into a two-dimensional feature tensor, where one dimension represents the time step and the other dimension represents the network behavior statistics within the time step. The encoder part of the unsupervised convolutional neural network is used to perform multi-scale convolution operations on the two-dimensional feature tensor to extract behavioral patterns within a local time window. A self-attention mechanism layer is introduced at the end of the encoder. This self-attention mechanism layer globally correlates the behavioral patterns within the local time window, enhancing the capture of long-range dependencies. The tensor processed by the self-attention mechanism layer is flattened and compressed through a fully connected layer to finally output the high-dimensional implicit feature vector.
[0009] As a further aspect of the present invention, a feature effectiveness evaluator is constructed. The feature effectiveness evaluator receives the high-dimensional implicit feature vector and calculates a feature quality score based on the information entropy of the feature distribution and the inter-class scatter, including: Principal component analysis is performed on the high-dimensional implicit feature vector to reduce the dimensionality to a visualized low-dimensional space. The distribution density of data points in the low-dimensional space is calculated, and the reciprocal of the distribution density is used as the quantized value of the information entropy. Separate known normal behavior feature subsets and known abnormal behavior feature subsets from the high-dimensional implicit feature vectors, calculate the centroids of these two subsets respectively, and calculate the Euclidean distance between the two centroids. Use the calculated Euclidean distance as the inter-class scatter. The feature quality score is obtained by weighting and summing the quantized value of the information entropy with the inter-class dispersion, wherein the quantized value of the information entropy reflects the stability of the feature and the inter-class dispersion reflects the discriminative ability of the feature.
[0010] As a further aspect of the present invention, when the feature quality score is lower than a preset qualified threshold, the network layer depth and convolution kernel parameters of the unsupervised convolutional neural network are adjusted, and the process of generating the high-dimensional implicit feature vector from the network behavior base table is re-executed, including: If the feature quality score is lower than the qualified threshold, the distribution of the high-dimensional implicit feature vector in the dimensionality reduction space is analyzed to determine whether there is overfitting or underfitting. When overfitting is detected, the dropout rate of the unsupervised convolutional neural network is increased, the network layer depth is reduced, and the size of the convolutional kernel is decreased to simplify the network structure. When underfitting is detected, a residual connection module is inserted into the existing structure of the unsupervised convolutional neural network, and the size of the convolutional kernel is increased to increase the receptive field and improve the network's ability to express complex patterns. After adjusting the network structure, the weights of the unsupervised convolutional neural network are fine-tuned using the fed-back key feature dimensions as regularization terms, and then feature extraction is performed again starting from the network behavior base table.
[0011] As a further aspect of the present invention, the high-dimensional implicit feature vector that meets the qualified threshold is input into a pre-trained behavior classification model for inference, identifying network behavior records with abnormal attributes, including: Load the behavior classification model based on a deep belief network, which has learned the decision boundaries between normal behavior and various intrusion behaviors during the training phase; The high-dimensional implicit feature vector is divided into multiple feature blocks in chronological order and then sequentially input into the behavior classification model for sliding window detection. The behavior classification model outputs a behavior probability distribution for each feature block. When the probability value of the abnormal behavior category in the behavior probability distribution corresponding to a certain feature block exceeds a set threshold, the timestamp and source address corresponding to the feature block are marked as network behavior records with abnormal attributes.
[0012] As a further aspect of the present invention, feature tracing is performed on the identified network behavior records with abnormal attributes to extract key feature dimensions that lead to the abnormal behavior, including: The activation values of the last hidden layer are extracted from the behavior classification model. The activation values correspond to the contribution of the high-dimensional implicit feature vector in the classification decision. Based on the specific meaning of each dimension in the high-dimensional implicit feature vector, calculate the gradient contribution of each dimension in the high-dimensional implicit feature vector to the probability value of the abnormal behavior category; Select the top few dimensions with the largest absolute values of gradient contribution, and define the original network behavior statistics corresponding to the dimensions as the key feature dimensions that lead to abnormal behavior.
[0013] As a further aspect of the present invention, key feature dimensions leading to abnormal behavior are extracted, and these key feature dimensions are fed back into the unsupervised convolutional neural network as constraints for the next round of feature extraction, including: The values of the key feature dimensions are used as soft labels and added to the loss function of the unsupervised convolutional neural network to form a hybrid loss function, which simultaneously includes the reconstruction error and the deviation penalty term of the key feature dimensions. In the next round of feature extraction from the network behavior base table, the unsupervised convolutional neural network optimizes the network parameters and forces the generated high-dimensional implicit feature vectors to maintain consistency with the normal behavior pattern in key feature dimensions, thereby suppressing the recurrence of similar abnormal behaviors.
[0014] As a further aspect of the present invention, before inputting the high-dimensional implicit feature vector that meets the qualified threshold into the pre-trained behavior classification model for inference, a feature space mapping step is also included: Since the training data distribution of the unsupervised convolutional neural network differs from the data distribution of the current network environment, a domain adaptation layer is introduced. The domain adaptation layer receives the high-dimensional implicit feature vector and performs a linear transformation on it, mapping the source domain feature space to the target domain feature space, so that the mapped features maintain the original intra-class compactness and inter-class separation in the target domain, and then inputs the mapped features into the behavior classification model.
[0015] As a further aspect of the present invention, the method also includes periodic incremental updates to the behavior classification model: Every fixed time period, newly generated network behavior data that has been manually labeled is aggregated to form an incremental training set; The behavior classification model is retrained in small batches using the incremental training set to update the weight parameters in the model to adapt to the latest changes in network behavior patterns. After retraining, the updated behavior classification model is used to re-infer the historical high-dimensional implicit feature vectors to verify the stability of the model update. If it is stable, the updated behavior classification model is deployed online to replace the old version.
[0016] Compared with the prior art, the advantages and positive effects of the present invention are as follows: A feature effectiveness evaluator based on information entropy and inter-class dispersion of feature distribution is constructed. This evaluator receives high-dimensional implicit feature vectors generated by an unsupervised convolutional neural network and calculates feature quality scores. When the score is lower than a preset passing threshold, the network layer depth and convolution kernel parameters of the unsupervised convolutional neural network are adjusted, and the high-dimensional implicit feature vector generation process is repeated until the score is acceptable. This scheme enables real-time verification of feature extraction quality, provides clear quantitative basis for parameter adjustment, eliminates reliance on human experience, and can screen out high-dimensional implicit features that better match the core attributes of network behavior, reducing interference from invalid and redundant features. This makes the feature extraction process more targeted and adaptive, and compared with conventional techniques, it can effectively avoid detection bias caused by insufficient feature quality.
[0017] This approach involves tracing the features of identified network behavior records with anomalous attributes, extracting key feature dimensions that lead to the anomalous behavior, and then feeding these key feature dimensions back into an unsupervised convolutional neural network as constraints for the next round of feature extraction. This scheme establishes a linkage mechanism between feature extraction and anomaly detection, enabling subsequent feature extraction to focus on key dimensions related to anomalies, progressively optimizing the feature extraction direction, reducing redundant extraction of irrelevant features, and allowing the feature extraction process to adaptively adjust to changes in network behavior and anomaly detection results. Compared to conventional techniques, this approach creates a closed-loop optimization between feature extraction and anomaly detection, improving the efficiency of feature extraction and the accuracy of anomalous behavior identification. Attached Figure Description
[0018] Figure 1This is a flowchart of the deep learning-based network behavior adaptive feature extraction and detection method described in this invention; Figure 2 A flowchart for generating high-dimensional implicit feature vectors for an unsupervised convolutional neural network; Figure 3 A time-series analysis graph of the probability of detecting network behavior anomalies; Figure 4 A graph showing the iterative changes in information entropy and inter-class dispersion for feature effectiveness evaluation; Figure 5 A time-series analysis of the stability score of the incremental update for the behavior classification model. Detailed Implementation
[0019] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0020] In the description of this invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," and "outer," etc., indicating orientation or positional relationships, are based on the orientation or positional relationships shown in the accompanying drawings and are only for the convenience of describing the invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, or be constructed and operated in a specific orientation, and therefore should not be construed as a limitation of the invention. Furthermore, in the description of this invention, "a plurality of" means two or more, unless otherwise explicitly specified.
[0021] See Figure 1The process involves collecting raw traffic data packets from the target network environment and performing data cleaning operations on these packets to obtain a structured network behavior base table. This base table is then fed into an unsupervised convolutional neural network (CNN), which performs deep encoding of the multidimensional attributes in the base table, automatically generating a set of high-dimensional implicit feature vectors. A feature validity evaluator is constructed. This evaluator receives the generated high-dimensional implicit feature vectors and calculates a feature quality score based on the information entropy of the feature distribution and the inter-class dispersion. If the calculated feature quality score is lower than a preset acceptable threshold, the network layer depth and kernel parameters of the unsupervised CNN are adjusted, and the process of generating high-dimensional implicit feature vectors from the network behavior base table is repeated. This cycle continues until the feature quality score meets the acceptable threshold. The high-dimensional implicit feature vectors that meet the acceptable threshold are then input into a pre-trained behavior classification model for inference to identify network behavior records with abnormal attributes. Feature tracing is performed on the identified abnormal network behavior records to extract the key feature dimensions that led to the abnormal behavior. These key feature dimensions are then fed back into the original unsupervised CNN as constraints for its next round of feature extraction.
[0022] In one embodiment of the present invention, see [reference] Figure 2 The raw traffic data packets contain source address, destination address, protocol type, and payload fields. During data cleaning, the header information of the raw traffic data packets is first parsed, and the packets are initially classified according to the transport layer protocol type. Session streams belonging to Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), and Simple Mail Transfer Protocol (SMPP) are extracted from the mixed traffic to form independent data streams based on the application protocol. For example, a specific session stream may contain multiple consecutive data packets with the same five-tuple information.
[0023] In some embodiments, for each classified session flow, it is necessary to calculate the time interval sequence of packet arrivals or transmissions within it. This time interval sequence reflects the rhythmic characteristics of network behavior. Based on a pre-measured or configured network baseline delay, a denoising operation is performed on the calculated raw time interval sequence to filter out abnormal time slices that deviate from the baseline delay range due to network background jitter. One optional denoising method is implemented using a threshold function, the expression of which is:
[0024] Where: symbol Represents the first time interval in the original time interval sequence The value of each time interval, sign Represents the preset network baseline delay, symbol Represents the allowable delay fluctuation tolerance, symbol The first one represents the result after denoising. Each time interval value. After noise reduction, the processed session stream undergoes session reassembly. The core of this process is to merge multiple scattered short connections originating from the same host pair and belonging to the same logical communication process into a unified long connection session based on protocol rules and time relevance. For each reassembled long connection session, its total number of bytes, total number of packets, and number of Transmission Control Protocol (TCP) retransmissions are counted. These statistical indicators are used as fields to ultimately form a single structured network behavior base table record. Multiple records together constitute the complete network behavior base table.
[0025] See Figure 3 In the time-series analysis of network behavior anomaly detection, a deep learning-based behavior classification model identifies abnormal behavior through sliding window inference. Specifically, the model divides the high-dimensional implicit feature vector into blocks and outputs a behavior probability distribution. The green curve represents the probability of normal behavior, the red curve represents the probability of abnormal behavior, and the orange dashed line represents the anomaly detection threshold. When the probability of abnormal behavior exceeds the threshold, the model marks the corresponding timestamp as an anomaly detection point with a black star. A total of five significant abnormal behaviors were identified, with their peak anomaly probabilities all exceeding 0.5, showing a clear inverse fluctuation compared to the probability of normal behavior, intuitively demonstrating the model's ability to discriminate network behavior patterns. This visualization result can be used for subsequent feature tracing and model iteration optimization, providing a quantitative basis for network security situational awareness.
[0026] In practical implementation, converting the structured network behavior base table into an input format that the model can process is a prerequisite for generating high-dimensional implicit feature vectors. The implementation method involves converting the network behavior base table into a two-dimensional feature tensor. One dimension of the two-dimensional feature tensor represents the time step, i.e., a time window divided into fixed durations (e.g., 1 minute); the other dimension represents multiple network behavior statistical indicators within each time step, such as the number of different protocol sessions, average packet length, and average flow rate within that time window. This conversion process can be understood as reshaping tabular data into a matrix with a spatiotemporal structure. The core function of an unsupervised convolutional neural network is to deeply encode the input two-dimensional feature tensor, automatically learning and generating a set of high-dimensional implicit feature vectors. These vectors aim to characterize the potential patterns of network behavior in temporal evolution and spatial correlation. The encoder part of the unsupervised convolutional neural network consists of multiple stacked convolutional layers, performing multi-scale convolution operations on the two-dimensional feature tensor. Convolutional kernels of different scales can extract behavioral pattern features within different local time windows. For example, a smaller-scale convolutional kernel may focus on burst patterns over several consecutive time steps, while a larger-scale convolutional kernel can sense trend changes over a longer time range.
[0027] In some embodiments, to enhance the model's ability to capture long-range dependencies, a self-attention mechanism layer is introduced at the end of the encoder. This self-attention mechanism layer receives an intermediate tensor containing multiple local behavioral pattern features output by the aforementioned convolutional layers, calculates attention weights between features at all positions within this tensor, and aggregates the features based on these weights, thereby achieving global association of behavioral patterns within a local time window. After processing by the self-attention mechanism layer, each position in the feature tensor incorporates the association information of the global context. Subsequently, the processed feature tensor is flattened in space, converting it into a one-dimensional feature vector. This one-dimensional vector is then subjected to nonlinear transformation and dimensionality compression through one or more fully connected layers, ultimately outputting a fixed-dimensional, dense, high-dimensional implicit feature vector.
[0028] In one embodiment of the present invention, the constructed feature validity evaluator receives high-dimensional implicit feature vectors generated by an unsupervised convolutional neural network. The evaluator performs principal component analysis on the high-dimensional implicit feature vectors, reducing the dimensionality of the high-dimensional implicit feature vectors from the original high-dimensional space to a low-dimensional space for visualization and computational analysis. In a specific implementation, the distribution density of all data points in the low-dimensional space is calculated, and the reciprocal of the calculated distribution density is used as the quantized value of information entropy. The quantized value of information entropy is used to reflect the degree of disorder or uncertainty of the feature distribution. It can be understood that a high and uniform distribution density corresponds to a lower reciprocal of the distribution density, which means that the feature distribution is compact, the information entropy is low, and the stability is high.
[0029] In some embodiments, the feature effectiveness evaluator also needs to calculate the inter-class dispersion. When calculating the inter-class dispersion, a subset of normal behavior features with known labels and a subset of abnormal behavior features with known labels are separated from the high-dimensional implicit feature vector set. The centroids of the normal behavior feature subset and the abnormal behavior feature subset are calculated separately. Then, the Euclidean distance between the centroids of the normal behavior feature subset and the abnormal behavior feature subset is calculated, and this Euclidean distance value is used as the inter-class dispersion. The inter-class dispersion is used to quantify the degree of separation between different categories of features in space; a larger inter-class dispersion value indicates a stronger discriminative ability of the feature. The feature effectiveness evaluator finally outputs a feature quality score, which is obtained by weighted summation of the quantified value of information entropy and the inter-class dispersion. The formula for calculating the feature quality score can be expressed as:
[0030] Where: symbol Represents the final calculated feature quality score, with the symbol... This represents the distribution density of data points in a low-dimensional space, therefore This is the quantified value of information entropy, with the symbol... Represents the calculated inter-class scatter (Euclidean distance), symbol and symbols These are preset weighting coefficients used to adjust the relative importance of the quantized value of information entropy and inter-class dispersion in the final score.
[0031] See Figure 4 In the feature effectiveness evaluation process of unsupervised convolutional neural networks, the coordinated changes in information entropy and inter-class scattering intuitively reflect the iterative optimization process of feature extraction quality. The green curve represents information entropy, whose value continuously decreases from an initial 0.81 to 0.19 with the number of iterations, indicating that the distribution density of high-dimensional implicit feature vectors gradually increases and feature stability is continuously enhanced. The orange curve represents inter-class scattering, whose value steadily increases from an initial 0.33 to 0.90, indicating that the centroid Euclidean distance between the feature subsets of normal and abnormal behavior continuously expands, and the feature's class discrimination ability is significantly improved. The inverse trend of these two changes verifies the effectiveness of the feature quality score formula and provides a quantitative basis for network structure adjustment and the introduction of regularization terms.
[0032] In practice, when the calculated feature quality score is lower than a preset acceptable threshold, the system triggers an adjustment process for the unsupervised convolutional neural network. The adjustment process first analyzes the distribution of high-dimensional implicit feature vectors in the reduced-dimensional space, determining whether the model exhibits overfitting or underfitting based on this distribution. Overfitting typically manifests as multiple small, dense, isolated clusters of high-dimensional implicit feature vectors in the reduced-dimensional space, with potentially low inter-class dispersion. Underfitting, on the other hand, is characterized by feature points from different categories being mixed together, diffusely distributed, and unable to form clear cluster structures. Therefore, identifying the underfitting phenomenon is fundamental to selecting subsequent adjustment strategies.
[0033] In some embodiments, if overfitting is identified, adjustment strategies are implemented. These strategies include increasing the dropout rate in the unsupervised convolutional neural network (CNN) to randomly drop some neuron outputs to reduce co-adaptation between neurons; reducing the network layer depth by removing some convolutional layers to reduce model complexity; and shrinking the kernel size to allow convolution operations to focus on more local patterns and prevent the learning of overly specific noise patterns. If underfitting is identified, adjustment strategies are implemented. These strategies include inserting residual connection modules into the existing structure of the CNN, which allow gradients to backpropagate more effectively, mitigating the vanishing gradient problem; and increasing the kernel size to cover a wider time window for each convolution operation, thereby increasing the model's receptive field and improving the CNN's ability to express complex behavioral patterns. Optionally, network structure adjustment is an iterative process that may require multiple rounds of fine-tuning.
[0034] In practice, after adjusting the network layer depth and kernel parameters of the unsupervised convolutional neural network, training does not begin directly with a new network. Instead, a fine-tuning step is introduced. This fine-tuning step uses key feature dimension information obtained through the feedback process, which is added as a regularization term to the loss function of the unsupervised convolutional neural network. In this state, one or more rounds of fine-tuning training are performed on all or some of the weight parameters of the adjusted unsupervised convolutional neural network, allowing the network parameters to initially adapt to the new structural constraints and feature orientation. After fine-tuning, the system restarts from the structured network behavior base table and performs a complete feature extraction process. This involves inputting the network behavior base table back into the adjusted and fine-tuned unsupervised convolutional neural network to generate new high-dimensional implicit feature vectors, and then re-evaluating the feature quality until the new feature quality score meets a preset acceptable threshold.
[0035] In one embodiment of the invention, after obtaining a high-dimensional implicit feature vector that meets the qualification threshold, the first step is to load a pre-trained behavior classification model. This model is built on a deep belief network. During the early training phase, the behavior classification model has learned the complex decision boundaries between normal network behavior and various known intrusion behaviors through a large amount of labeled data. The high-dimensional implicit feature vector is a sequence containing temporal information. To perform fine-grained detection, the high-dimensional implicit feature vector needs to be divided into multiple consecutive feature blocks according to the chronological order of its generation. Each feature block contains a fixed number of time steps of high-dimensional implicit feature vectors; for example, a feature block can contain feature data from 60 consecutive time steps, representing a summary of behavioral patterns within a time window.
[0036] In some embodiments, feature blocks can be generated using a sliding window mechanism, where the sliding window slides along the time axis with a fixed step size. It can be understood that dividing the high-dimensional implicit feature vector of a long sequence into multiple feature blocks using a sliding window is to adapt to the input size of the behavior classification model and achieve continuous coverage of the entire monitoring period. Subsequently, these feature blocks are sequentially input into the loaded behavior classification model for inference. The behavior classification model performs forward computation on each input feature block. For each input feature block B, the behavior classification model outputs a behavior probability distribution P. The behavior probability distribution P is a vector whose dimension equals the total number of behavior categories, including a "normal" category and multiple specific "abnormal" categories (such as port scanning, brute-force attacks, data leakage, etc.). Each element in the vector represents the probability that the input feature block B is classified as the corresponding behavior category. The calculation of the behavior probability distribution P can be formally expressed as:
[0037] Where: symbol The symbol represents the behavior probability distribution vector output by the behavior classification model. Represents a feature block of the input, symbol The forward propagation function of a behavior classification model based on a deep belief network architecture is represented by the symbol. This represents all the trained weight parameters in the behavior classification model. The sum of the probability values of each class in the behavior probability distribution vector P is 1.
[0038] In practical implementation, after the behavior classification model completes the calculation of feature blocks and outputs the behavior probability distribution, the system needs to make a judgment based on the probability distribution. The system presets one or more judgment thresholds for abnormal behavior categories. When the probability value of any abnormal behavior category in the behavior probability distribution P exceeds its corresponding set threshold, an abnormal alarm is triggered. For example, for the abnormal category of "port scanning," the threshold value is set to 0.85. If the probability value of the "port scanning" category in the behavior probability distribution P corresponding to feature block B is 0.92, since 0.92 > 0.85, the behavior represented by this feature block is judged to be abnormal. It can be understood that the set threshold value is a configurable parameter used to balance detection sensitivity and false alarm rate. Different set threshold values can be set for abnormal behaviors of different severity levels. Once a feature block is judged to represent abnormal behavior, the system will record the original time window information corresponding to the feature block. Specifically, the timestamp range of the original high-dimensional implicit feature vector used to generate the feature block is extracted, and the network traffic records within that time period are traced to obtain its main source address. Finally, the two key pieces of information, "timestamp" and "source address," are bound together to form a complete network behavior record with abnormal attributes, which can be used for subsequent alerts and source tracing analysis.
[0039] In one embodiment of the present invention, after the behavior classification model identifies network behavior records with abnormal attributes, the first step in feature tracing is to extract the activation values of the last hidden layer from the behavior classification model. The activation values of the last hidden layer are vectors, and the magnitude of each dimension directly corresponds to the contribution of different feature dimensions in the input high-dimensional implicit feature vector to the final classification decision. The high-dimensional implicit feature vector typically consists of dozens or hundreds of dimensions, each dimension representing a certain latent behavioral pattern feature automatically learned by the unsupervised convolutional neural network.
[0040] In some embodiments, considering the specific meaning of each dimension in the high-dimensional implicit feature vector, it is necessary to quantitatively calculate the gradient contribution of each dimension in the high-dimensional implicit feature vector to the final probability value of the abnormal behavior category. The gradient contribution reflects the rate of change of the abnormal category probability value output by the model when a certain feature dimension undergoes a small change. For a behavior that is determined to belong to the abnormal category... The network behavior records, and their corresponding high-dimensional implicit feature vectors are denoted as . The behavior classification model outputs an anomaly category for this record. The probability is denoted as Feature Dimension gradient contribution It can be calculated right The absolute value of the partial derivative is used for quantification, and its formula can be expressed as:
[0041] Where: symbol Represents the th element in the high-dimensional implicit eigenvector. The gradient contribution value of each dimension, with symbol The anomaly category predicted by the behavior classification model The probability value, sign The first eigenvector representing the high-dimensional implicit eigenvector The specific values and symbols of each dimension This represents partial differential operations. Gradient contribution value. The larger the value, the higher the feature dimension. The greater the impact on the model's determination that the current behavior is abnormal, the better. It can be understood that gradient contribution calculation establishes an interpretable link between the model's decision results and the original feature dimensions.
[0042] In practical implementation, after calculating the gradient contribution of all dimensions, the system assigns all gradient contribution values... Sort in descending order of absolute value. Select the gradients with the largest absolute values after sorting. There are several dimensions, among which It is a pre-defined positive integer. This is before... These dimensions were identified as key feature dimensions that dominated the current anomalous behavior. Next, these key feature dimensions need to be mapped back to original, understandable network behavior statistics. Each dimension of the high-dimensional implicit feature vector, when generated, has a non-linear but definite correspondence with certain original statistical indicators in the network behavior base table (such as the number of packets for a specific protocol within a specific time window). By analyzing the weights of the encoding layers of the unsupervised convolutional neural network, the original indicators mainly associated with the key feature dimensions can be traced and identified, and these original network behavior statistics can be explicitly defined as the key feature dimensions leading to this anomalous behavior. See Table 1 for an example of the tracing results.
[0043] Table 1: Examples of Gradient Contributions of High-Dimensional Implicit Feature Vectors ; In practice, after extracting the key feature dimensions that lead to abnormal behavior, these key feature dimensions are fed back into the unsupervised convolutional neural network as constraints for the next round of feature extraction. The core of this feedback mechanism is to integrate the information of the key feature dimensions into the training objective of the unsupervised convolutional neural network in the form of soft labels. Specifically, a new hybrid loss function is designed for the unsupervised convolutional neural network. Hybrid Loss Function It consists of two parts. The first part is the traditional reconstruction error of unsupervised convolutional neural networks. The first part is used to ensure the fidelity of the encoding and decoding process; the second part is the deviation penalty term for the key feature dimensions. This is used to constrain the features generated by the network to be consistent with the normal pattern in a specific dimension. The hybrid loss function can be expressed as a weighted sum of the two.
[0044] In some embodiments, the deviation penalty term for key feature dimensions The calculation relies on historical normal behavior data. The system maintains a normal behavior feature library, which stores the typical values or value ranges of high-dimensional implicit feature vectors corresponding to historically confirmed normal network behaviors in key dimensions. For newly generated high-dimensional implicit feature vectors, the difference between their values in key feature dimensions and the typical values of the corresponding dimensions in the normal feature library is calculated. Deviation penalty term. It will penalize generated features that deviate significantly from normal values in key feature dimensions. In the next round of feature extraction starting from the network behavior base table, the unsupervised convolutional neural network minimizes the mixture loss function. The network parameters are optimized to achieve the desired outcome. Because the hybrid loss function includes a deviation penalty term for key feature dimensions, the unsupervised convolutional neural network is forced to learn and generate high-dimensional implicit feature vectors, ensuring that the generated high-dimensional implicit feature vectors maintain numerical consistency with normal behavior patterns in the key feature dimensions identified as anomalous triggers. Optionally, this constraint is automatically implemented during training using the gradient descent algorithm. In this way, when similar anomalous behavior patterns reappear in network traffic, the features generated by the feature extraction network, constrained, will approach normal values in key dimensions, thus reducing the distinguishability between such anomalous features and normal features. At the behavior classification model level, this manifests as suppressing the successful detection of similar anomalous behaviors again.
[0045] In one embodiment of the present invention, a feature space mapping step is performed before inputting the high-dimensional implicit feature vector that meets the qualification threshold into the pre-trained behavior classification model for inference. Since the unsupervised convolutional neural network that generates the high-dimensional implicit feature vector may be trained on historical network environment data, its training data distribution differs from the data distribution of the target network environment to be detected. This difference may lead to feature distribution shift, thus affecting the accuracy of the behavior classification model. To alleviate this problem, a domain adaptation layer is introduced between feature extraction and classification. The domain adaptation layer is essentially a learnable linear transformation matrix. It receives the high-dimensional implicit feature vector output by the unsupervised convolutional neural network and performs a linear transformation on the high-dimensional implicit feature vector through matrix multiplication and bias addition, mapping the high-dimensional implicit feature vector from the source domain feature space to the target domain feature space that better matches the current target network environment. The goal of the mapping process is to eliminate inter-domain distribution differences while preserving the discriminative structure of the high-dimensional implicit feature vector in the original space to the greatest extent possible, so that the mapped features maintain their original intra-class compactness and inter-class separability in the target domain. The mathematical expression of feature space mapping can be:
[0046] Where: symbol Represents a high-dimensional implicit feature vector derived from the source domain, output from an unsupervised convolutional neural network, denoted by [symbol]. The weight matrix representing the domain adaptation layer, with the symbol... The bias vector representing the domain adaptation layer, symbol This represents a high-dimensional implicit eigenvector located in the feature space of the target domain after mapping. It can be understood that the weight matrix... and bias vector The parameters need to be learned and determined through a fine-tuning process involving a small amount of labeled data from the target domain or based on adversarial learning. After mapping, the high-dimensional implicit feature vectors in the target domain feature space are... The input is fed into a pre-trained behavior classification model for subsequent inference.
[0047] In some embodiments, the behavior classification model is also periodically updated incrementally to adapt to the dynamic evolution of network behavior patterns. The system automatically executes the incremental update process at fixed time intervals, such as every 24 hours or week. The first step of the update process is to collect newly generated network behavior data in the current network environment since the deployment of the previous model version, and to filter out data records whose labels have been manually confirmed by the security management platform or analysts from this new data. This batch of new data with accurate labels is then aggregated to form an incremental training set. It is understood that the size of the incremental training set is much smaller than the initial training set; its purpose is to allow the model to learn the latest changes in behavior patterns rather than learning from scratch. The deployed behavior classification model is then retrained in small batches using the incremental training set. During retraining, a small learning rate is typically used, and the weight parameters in the behavior classification model are iteratively updated using the data from the incremental training set. This process aims to adapt the knowledge of the behavior classification model to the latest changes in network behavior patterns while avoiding catastrophic forgetting of existing knowledge.
[0048] See Figure 5 In the periodic incremental update process of the behavior classification model, the temporal change of the stability score intuitively verifies the effectiveness of the incremental learning strategy. The purple curve and the filled area represent the model stability score, which gradually increases from 0.70 in the initial model to 0.85 in the 5th period, showing a steady upward trend. This result indicates that the parameter update strategy based on mini-batch incremental training and a low learning rate not only enables the model to effectively learn new network behavior patterns but also avoids catastrophic forgetting of existing knowledge, ensuring the model's continuous reliability and robustness in dynamic network environments. This provides crucial quantitative evidence of model stability for deployment and online application.
[0049] In practice, after incremental retraining, the updated behavior classification model cannot be deployed immediately; its stability must first be verified. This verification involves using the updated behavior classification model to re-infer historical high-dimensional implicit feature vectors over a period of time. These historical high-dimensional implicit feature vectors correspond to network behavior records that have been judged by the model and observed over a period. By comparing the inference results of the updated and old versions of the behavior classification model on this historical data, a consistency index is calculated, such as the Kappa coefficient of the classification results or whether the accuracy of judgments on historical confirmed samples has fluctuated significantly. If the consistency index is higher than a preset stability threshold, it indicates that the model update is stable, and the new model has absorbed new knowledge without compromising the original correct decision boundaries. In this case, the updated behavior classification model is deployed, replacing the old model version. Optionally, if the stability verification fails, the incremental update is abandoned, the old model is retained and continued to run, and this update failure is recorded for subsequent analysis. Through this periodic incremental update mechanism, the behavior classification model can continuously evolve, effectively addressing the challenges posed by new network attacks or changes in network structure.
[0050] The above are merely preferred embodiments of the present invention and are not intended to limit the present invention in any other way. Any person skilled in the art may make changes or modifications to the above-disclosed technical content to create equivalent embodiments that can be applied to other fields. However, any simple modifications, equivalent changes, and modifications made to the above embodiments based on the technical essence of the present invention without departing from the scope of the present invention shall still fall within the protection scope of the present invention.
Claims
1. A method for adaptive feature extraction and detection of network behavior based on deep learning, characterized in that, The method includes: Collect raw traffic data packets in the target network environment, clean the raw traffic data packets, and obtain a structured network behavior base table; The network behavior base table is input into an unsupervised convolutional neural network, which performs deep encoding on the multidimensional attributes in the network behavior base table to automatically generate a set of high-dimensional implicit feature vectors. Construct a feature effectiveness evaluator, which receives the high-dimensional implicit feature vector and calculates a feature quality score based on the information entropy of the feature distribution and the inter-class scatter. When the feature quality score is lower than the preset qualified threshold, the network layer depth and convolution kernel parameters of the unsupervised convolutional neural network are adjusted, and the process of generating the high-dimensional implicit feature vector from the network behavior base table is re-executed until the feature quality score meets the qualified threshold. The high-dimensional implicit feature vector that meets the qualified threshold is input into the pre-trained behavior classification model for inference to identify network behavior records with abnormal attributes. The network behavior records with abnormal attributes are traced by feature tracing, the key feature dimensions that lead to the abnormal behavior are extracted, and the key feature dimensions are fed back to the unsupervised convolutional neural network as constraints for the next round of feature extraction.
2. The method for adaptive feature extraction and detection of network behavior based on deep learning as described in claim 1, characterized in that, The raw traffic data packets are cleaned to obtain a structured network behavior base table, including: The original traffic data packet contains the source address, destination address, protocol type, and payload; The header information of the original traffic data packets is parsed, and the data packets are classified according to the transport layer protocol type. Session streams belonging to Hypertext Transfer Protocol, File Transfer Protocol and Simple Mail Transfer Protocol are extracted respectively. For each type of session flow, the time interval sequence of its data packets is calculated, and the time interval sequence is denoised based on the network reference delay to filter out abnormal time slices caused by network jitter. The denoised session stream is reassembled to merge scattered short connections into long connection sessions. The number of bytes, packets, and retransmissions for each long connection session are counted to form the structured network behavior base table.
3. The method for adaptive feature extraction and detection of network behavior based on deep learning as described in claim 2, characterized in that, The network behavior base table is input into an unsupervised convolutional neural network, which performs deep encoding on the multidimensional attributes in the network behavior base table to automatically generate a set of high-dimensional implicit feature vectors, including: The high-dimensional implicit feature vectors characterize the potential patterns of network behavior in time and space; The structured network behavior base table is converted into a two-dimensional feature tensor, where one dimension represents the time step and the other dimension represents the network behavior statistics within the time step. The encoder part of the unsupervised convolutional neural network is used to perform multi-scale convolution operations on the two-dimensional feature tensor to extract behavioral patterns within a local time window. A self-attention mechanism layer is introduced at the end of the encoder. This self-attention mechanism layer globally correlates the behavioral patterns within the local time window, enhancing the capture of long-range dependencies. The tensor processed by the self-attention mechanism layer is flattened and compressed through a fully connected layer to finally output the high-dimensional implicit feature vector.
4. The method for adaptive feature extraction and detection of network behavior based on deep learning as described in claim 3, characterized in that, Construct a feature effectiveness evaluator, which receives the high-dimensional implicit feature vector and calculates a feature quality score based on the information entropy of the feature distribution and the inter-class scatter, including: Principal component analysis is performed on the high-dimensional implicit feature vector to reduce the dimensionality to a visualized low-dimensional space. The distribution density of data points in the low-dimensional space is calculated, and the reciprocal of the distribution density is used as the quantized value of the information entropy. Separate known normal behavior feature subsets and known abnormal behavior feature subsets from the high-dimensional implicit feature vectors, calculate the centroids of these two subsets respectively, and calculate the Euclidean distance between the two centroids. Use the calculated Euclidean distance as the inter-class scatter. The feature quality score is obtained by weighting and summing the quantized value of the information entropy with the inter-class dispersion, wherein the quantized value of the information entropy reflects the stability of the feature and the inter-class dispersion reflects the discriminative ability of the feature.
5. The method for adaptive feature extraction and detection of network behavior based on deep learning as described in claim 4, characterized in that, When the feature quality score is lower than a preset qualified threshold, the network layer depth and convolution kernel parameters of the unsupervised convolutional neural network are adjusted, and the process of generating the high-dimensional implicit feature vector from the network behavior base table is re-executed, including: If the feature quality score is lower than the qualified threshold, the distribution of the high-dimensional implicit feature vector in the dimensionality reduction space is analyzed to determine whether there is overfitting or underfitting. When overfitting is detected, the dropout rate of the unsupervised convolutional neural network is increased, the network layer depth is reduced, and the size of the convolutional kernel is decreased to simplify the network structure. When underfitting is detected, a residual connection module is inserted into the existing structure of the unsupervised convolutional neural network, and the size of the convolutional kernel is increased to increase the receptive field and improve the network's ability to express complex patterns. After adjusting the network structure, the weights of the unsupervised convolutional neural network are fine-tuned using the fed-back key feature dimensions as regularization terms, and then feature extraction is performed again starting from the network behavior base table.
6. The method for adaptive feature extraction and detection of network behavior based on deep learning as described in claim 5, characterized in that, The high-dimensional implicit feature vectors that meet the qualified threshold are input into a pre-trained behavior classification model for inference, identifying network behavior records with abnormal attributes, including: Load the behavior classification model based on a deep belief network, which has learned the decision boundaries between normal behavior and various intrusion behaviors during the training phase; The high-dimensional implicit feature vector is divided into multiple feature blocks in chronological order and then sequentially input into the behavior classification model for sliding window detection. The behavior classification model outputs a behavior probability distribution for each feature block. When the probability value of the abnormal behavior category in the behavior probability distribution corresponding to a certain feature block exceeds a set threshold, the timestamp and source address corresponding to the feature block are marked as network behavior records with abnormal attributes.
7. The method for adaptive feature extraction and detection of network behavior based on deep learning as described in claim 6, characterized in that, Feature tracing is performed on identified network behavior records with anomalous attributes to extract key feature dimensions leading to the anomalous behavior, including: The activation values of the last hidden layer are extracted from the behavior classification model. The activation values correspond to the contribution of the high-dimensional implicit feature vector in the classification decision. Based on the specific meaning of each dimension in the high-dimensional implicit feature vector, calculate the gradient contribution of each dimension in the high-dimensional implicit feature vector to the probability value of the abnormal behavior category; Select the top few dimensions with the largest absolute values of gradient contribution, and define the original network behavior statistics corresponding to the dimensions as the key feature dimensions that lead to abnormal behavior.
8. The method for adaptive feature extraction and detection of network behavior based on deep learning as described in claim 7, characterized in that, The key feature dimensions leading to abnormal behavior are extracted and fed back into the unsupervised convolutional neural network as constraints for the next round of feature extraction, including: The values of the key feature dimensions are used as soft labels and added to the loss function of the unsupervised convolutional neural network to form a hybrid loss function, which simultaneously includes the reconstruction error and the deviation penalty term of the key feature dimensions. In the next round of feature extraction from the network behavior base table, the unsupervised convolutional neural network optimizes the network parameters and forces the generated high-dimensional implicit feature vectors to maintain consistency with the normal behavior pattern in key feature dimensions, thereby suppressing the recurrence of similar abnormal behaviors.
9. The method for adaptive feature extraction and detection of network behavior based on deep learning as described in claim 8, characterized in that, Before inputting the high-dimensional implicit feature vector that meets the qualifying threshold into the pre-trained behavior classification model for inference, a feature space mapping step is also included: Since the training data distribution of the unsupervised convolutional neural network differs from the data distribution of the current network environment, a domain adaptation layer is introduced. The domain adaptation layer receives the high-dimensional implicit feature vector and performs a linear transformation on it, mapping the source domain feature space to the target domain feature space, so that the mapped features maintain the original intra-class compactness and inter-class separation in the target domain, and then inputs the mapped features into the behavior classification model.
10. The method for adaptive feature extraction and detection of network behavior based on deep learning as described in claim 9, characterized in that, It also includes periodic incremental updates to the behavior classification model: Every fixed time period, newly generated network behavior data that has been manually labeled is aggregated to form an incremental training set; The behavior classification model is retrained in small batches using the incremental training set to update the weight parameters in the model to adapt to the latest changes in network behavior patterns. After retraining, the updated behavior classification model is used to re-infer the historical high-dimensional implicit feature vectors to verify the stability of the model update. If it is stable, the updated behavior classification model is deployed online to replace the old version.