Feature repositioning image encoding method for network traffic classification
By employing a feature relocation image coding method, the problems of feature collision, unutilized importance, and spatial instability in network traffic classification are solved. A stable and consistent feature mapping is established, which improves the discriminative ability and robustness of network traffic classification and is applicable to network traffic analysis and intrusion detection.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING UNIV OF POSTS & TELECOMM
- Filing Date
- 2026-03-19
- Publication Date
- 2026-06-19
AI Technical Summary
Existing network traffic classification methods suffer from problems such as collisions of traffic statistical features, underutilization of importance, spatial structure instability, and class imbalance, which affect classification performance.
A feature relocation image coding method is adopted. Through data preprocessing, two-dimensional feature embedding, initial pixel position mapping, feature importance ranking and relocation, and minority class enhancement steps, a one-to-one mapping relationship between traffic statistics features and pixel positions is established. Combined with generative adversarial networks to generate minority class samples, a stable and discriminative two-dimensional image representation is constructed.
It achieves lossless feature mapping, highlights key features, improves classification and discrimination capabilities, enhances model stability and robustness, alleviates class imbalance problems, and is applicable to various network traffic analysis and intrusion detection scenarios.
Smart Images

Figure CN122244185A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the fields of network security and artificial intelligence, specifically to a feature relocation image coding method for network traffic classification. Background Technology
[0002] With the widespread deployment of network applications and encrypted communication technologies, network traffic exhibits high complexity and uninterpretability at both the protocol and content levels. Existing network traffic classification methods mainly include methods based on statistical features, methods based on deep learning, and methods that classify traffic features by mapping them to images.
[0003] Image-based network traffic classification methods, by encoding high-dimensional features into two-dimensional images and leveraging the spatial modeling capabilities of convolutional neural networks, have improved classification performance to some extent. However, existing technologies generally suffer from the following problems in mapping traffic statistical features to two-dimensional image representations:
[0004] 1. Severe collision problem of traffic statistics features: When embedding high-dimensional traffic statistics features into a two-dimensional pixel grid, different traffic statistics features may be mapped to the same pixel position, resulting in feature collision problem and loss of some traffic statistics feature information.
[0005] 2. The importance of traffic statistics features is not fully utilized: Existing methods usually treat all features equally and do not consider the differences in the contribution of different traffic statistics features to the classification task, making it difficult to effectively express traffic statistics features that play an important role in classification during the image encoding stage.
[0006] 3. Insufficient spatial structure stability: Some feature mapping schemes produce unstable spatial structures under different data samples or different training stages, resulting in a lack of consistency in the positional distribution of the same traffic statistical feature in the two-dimensional image, which affects the stability and generalization ability of subsequent classification models.
[0007] 4. Class imbalance affects classification performance: Network traffic data often suffers from insufficient minority class samples. Training directly based on original features or unstable image encoding can easily cause the classifier to be biased towards the majority class.
[0008] Therefore, there is an urgent need for a network traffic image encoding and classification method that can preserve all feature information, avoid collisions of traffic statistics features, highlight key traffic statistics features, and possess good stability and scalability during the two-dimensional image encoding process. Summary of the Invention
[0009] To address the aforementioned issues, this invention discloses a feature relocation image coding method for network traffic classification. This method can preserve all feature information, avoid collisions of traffic statistics features, highlight key traffic statistics features, and possesses good stability during the two-dimensional image coding process.
[0010] A feature relocation image coding method for network traffic classification includes the following steps:
[0011] S1 Data Preprocessing Steps: Obtain network traffic sample data, perform data cleaning and numerical normalization on each network traffic sample, and obtain the normalized feature matrix.
[0012] The S2 flow statistics feature two-dimensional embedding step involves transposing the normalized feature matrix to obtain multiple flow statistics feature vectors, and performing a two-dimensional embedding mapping on each flow statistics feature vector to obtain the embedding coordinates of each flow statistics feature in two-dimensional space.
[0013] S3 Initial pixel position mapping step: Based on the two-dimensional embedded coordinates, each traffic statistics feature is mapped to the initial pixel position in the two-dimensional pixel grid, thereby establishing the correspondence between the traffic statistics feature and the two-dimensional image pixel position.
[0014] The S4 traffic statistics feature importance ranking step calculates the mutual information value between each traffic statistics feature and the network traffic category label, and ranks all traffic statistics features according to the mutual information size, so that traffic statistics features with higher importance participate in subsequent pixel allocation first.
[0015] S5 Feature Relocation and Conflict Resolution Step: When multiple traffic statistics features are mapped to the same pixel location, the pixel locations of the traffic statistics features with higher importance are retained first according to the feature importance ranking results, and new pixel locations are assigned to the conflicting traffic statistics features through a search strategy, thereby avoiding feature collisions.
[0016] S6 Two-dimensional image construction steps: Based on the final determined mapping relationship between traffic statistics features and pixel positions, the normalized feature vector of each network traffic sample is mapped to a two-dimensional pixel grid to construct the corresponding network traffic image representation.
[0017] S7 Minority Class Enhancement Step: Based on the obtained two-dimensional image, an auxiliary classification generative adversarial network is introduced to generate synthetic network traffic image samples for the specified minority class. The synthetic samples are then used together with the original samples for training to supplement the minority class sample data and alleviate the impact of class imbalance on classification performance.
[0018] S8 Classification Steps: Input the enhanced 2D image into the network traffic classification model, and output the corresponding network traffic category to achieve network traffic classification.
[0019] A network traffic image encoding and classification system based on feature relocation with mutual information preservation includes: a data processing module for acquiring network traffic sample data, performing data cleaning and numerical normalization on the network traffic samples, and constructing a network traffic feature matrix; and a two-dimensional feature embedding module for using traffic statistical features as embedding objects, mapping the traffic statistical features to a two-dimensional continuous space using a dimensionality reduction algorithm, and obtaining the two-dimensional embedding coordinates corresponding to each traffic statistical feature.
[0020] The grid mapping and collision detection module is used to map two-dimensional embedded coordinates to discrete pixel grids to obtain the initial pixel positions of each flow statistics feature, and to detect whether there are feature collisions where multiple flow statistics features are mapped to the same pixel position.
[0021] The mutual information evaluation and ranking module is used to calculate the mutual information value between each traffic statistical feature and the network traffic category label, and to rank the importance of the traffic statistical features according to the mutual information value.
[0022] The feature relocation module is used to assign new pixel positions to the conflicting traffic statistics features based on the importance ranking results of the traffic statistics features when pixel conflicts occur, and to establish a one-to-one mapping relationship between traffic statistics features and pixel positions.
[0023] The image construction module is used to encode the normalized feature vector of the network traffic sample into a two-dimensional image representation based on the mapping relationship between the traffic statistics features and pixel positions.
[0024] The minority class sample augmentation module is used to generate synthetic network traffic image samples of a specified class based on the auxiliary classification generative adversarial network to supplement the minority class sample data;
[0025] The classification module is used to input the two-dimensional image into the network traffic classification model and output the corresponding network traffic category; each of the above modules is used to execute the corresponding functional steps in the method.
[0026] The beneficial effects of this invention are:
[0027] 1. Achieve lossless image encoding of traffic statistics features. Through a feature relocation mechanism, ensure that each traffic statistics feature is mapped to a unique pixel location, avoiding feature overwriting and information loss.
[0028] 2. Highlight key traffic statistics features to improve classification and discrimination capabilities; sort traffic statistics features by mutual information so that traffic statistics features that contribute highly to the classification results occupy core spatial positions first, thereby improving the discrimination capability of the coding results.
[0029] 3. Improve the stability and consistency of spatial mapping. Once a one-to-one mapping relationship between traffic statistics features and pixels is established, it can be reused in the training and inference stages, thereby improving the robustness of the system.
[0030] 4. The generated two-dimensional images can be directly used in convolutional neural networks for network traffic classification, and can be combined with generative adversarial networks for data augmentation to alleviate class imbalance problems. This invention does not rely on specific network protocols or plaintext content and is applicable to various network traffic analysis and intrusion detection scenarios. Attached Figure Description
[0031] Figure 1 This is a schematic diagram of the method flow of the present invention. Detailed Implementation
[0032] The present invention will be further illustrated below with reference to the accompanying drawings and specific embodiments. It should be understood that the following specific embodiments are for illustrative purposes only and are not intended to limit the scope of the invention. It should be noted that the terms "front," "rear," "left," "right," "up," and "down" used in the following description refer to directions in the accompanying drawings, and the terms "inner" and "outer" refer to directions toward or away from the geometric center of a specific component, respectively.
[0033] Example 1:
[0034] like Figure 1 As shown, a feature relocation image coding method based on mutual information preservation is used for network traffic classification. The method includes the following steps:
[0035] S1: To ensure the stability and reproducibility of the MPFR model during training, this paper first performs unified data preprocessing on the CICIDS2017 dataset. The preprocessing process only includes necessary data cleaning and numerical normalization operations to obtain a normalized network traffic feature matrix, without introducing additional feature combinations or high-dimensional feature expansions, to ensure that subsequent performance improvements mainly come from the encoding strategy itself.
[0036] Specifically, the raw data is first cleaned, including checking and deleting duplicate samples, missing values, and obviously abnormal data records to reduce the interference of noisy samples on model training. In this embodiment, network traffic samples can be derived from publicly available network traffic datasets, such as the CICIDS2017 dataset. Each network traffic sample consists of multiple traffic statistical features, such as destination port, flow duration, number of forward packets, number of backward packets, and traffic statistical time interval.
[0037] Given N network traffic samples, each containing M traffic statistical features, the original feature matrix can be represented as:
[0038]
[0039] in, This represents the original value of the i-th network traffic sample in the j-th traffic statistical feature.
[0040] Subsequently, for the continuous numerical features that account for the largest proportion in the dataset, and in order to eliminate the influence of the differences in the dimensions of different flow statistics features on subsequent processing, it is necessary to normalize the feature matrix. This paper adopts the Min-Max Normalization method to perform linear scaling on it, and uniformly maps the features of different dimensions to the interval [0,1]. The normalization formula is shown in Equation (1):
[0041] (1)
[0042] in This represents the original value of the j-th traffic statistical feature in the i-th network traffic sample. , Let $\begin{p}$ and $\begin{p}$ represent the maximum and minimum values of the $j$-th flow statistical feature in all samples, respectively. The value is the normalized value.
[0043] Through the above processing, the normalized sample feature vector of each network traffic sample can be obtained:
[0044]
[0045] in, Let represent the normalized traffic statistics feature vector of the i-th network traffic sample.
[0046] This normalization operation can effectively prevent high-value features from having a dominant influence on gradient updates during model training, thereby improving the model's convergence stability and generalization ability.
[0047] S2: Two-dimensional feature embedding. During the image encoding process, the MPFR model first utilizes t-distributedStochastic Neighbor Embedding (t-SNE) to perform two-dimensional embedding of network traffic statistics features to learn the relative spatial relationships between features. This paper uses the traffic statistics feature vector as the embedding object, mapping the value vector of each traffic statistics feature across all samples to a two-dimensional continuous space to obtain its two-dimensional embedding coordinates.
[0048] If the original dataset contains a total of Each sample contains network traffic samples. If there are several traffic statistics features, then the normalized feature matrix is:
[0049]
[0050] Each row represents a sample feature vector, and each column represents a traffic statistics feature.
[0051] By transposing the characteristic matrix, we obtain A set consisting of 1 eigenvectors:
[0052] (3)
[0053] in, Indicates the first The vector of values for each flow statistical feature across all samples. This represents the normalized value of the i-th network traffic sample in the j-th traffic statistical feature;
[0054] For each flow statistics feature vector, perform a two-dimensional embedding mapping to obtain two-dimensional embedding coordinates:
[0055]
[0056] in, Represents the statistical eigenvector The two-dimensional coordinate points obtained after two-dimensional embedding and The t-SNE, representing the coordinate components of the two-dimensional coordinate point along the x-axis and y-axis respectively, constructs feature similarity probability distributions in both the high-dimensional feature space and the low-dimensional embedding space, minimizing the Kullback-Leibler divergence between the two distributions. This ensures that traffic statistics features with similar value distributions in the original space maintain a close neighborhood relationship in the two-dimensional space. This process provides a continuous and semantically based initial structure for the subsequent spatial layout of features to pixels.
[0057] S3: Discrete Mapping and Feature Collision Mapping. After completing the 2D feature embedding, it is necessary to map the feature coordinates in continuous space to a discrete pixel grid of finite size. Let the image resolution be... This paper uses Min–Max normalization and integer rounding to map feature coordinates to pixel indices:
[0058]
[0059]
[0060] in Indicates the first The initial pixel position of each traffic statistics feature. Indicates rounding down. Each represents all The set of coordinate components of a feature in the x-axis and y-axis directions.
[0061] Based on the above, the initial pixel position of each flow statistics feature can be obtained. When it exists And satisfy The system identifies feature collisions, meaning multiple traffic statistics features are mapped to the same pixel location. Subsequent steps will perform conflict resolution and relocation processing on the colliding traffic statistics features to establish a one-to-one mapping relationship between traffic statistics features and pixel locations.
[0062] S4: Feature Importance Evaluation Based on Mutual Information. Considering the varying contributions of different traffic statistics features to the classification task, this paper introduces mutual information as a measure of feature importance and ranks traffic statistics features by importance based on their mutual information values. This ensures that traffic statistics features that contribute more to classification are prioritized during subsequent pixel allocation and relocation.
[0063] Let the random variable for category label be The random variable corresponding to the j-th flow statistical feature is: The mutual information between the j-th traffic statistics feature and the category label is defined as follows:
[0064]
[0065] in, This represents the mutual information value between the j-th traffic statistics feature and the category label. This indicates the possible values for the category label. This represents the joint probability distribution of traffic statistics feature values and category labels. This represents the marginal probability distribution of the values of traffic statistics features. Marginal probability distribution representing category labels
[0066] A higher mutual information value indicates a stronger statistical correlation between the traffic statistics feature and the category label, and thus a higher discriminative ability. Based on this, all M traffic statistics features are sorted from largest to smallest mutual information value to obtain a priority sequence for the traffic statistics features:
[0067]
[0068] in This indicates the sorting results of the traffic statistics feature index. Indicates a descending order.
[0069] In the subsequent pixel allocation process, the flow statistics features with higher mutual information are given priority to retain their initial pixel positions, thereby improving the expressive power and structural stability of key flow statistics features in the two-dimensional image space.
[0070] S5: Feature relocation based on spiral search. When allocating pixel positions according to mutual information priority, if the target pixel is not yet occupied, the mapping is completed directly; if the target pixel is already occupied, the feature relocation mechanism is triggered. To avoid discarding traffic statistics features while maintaining local spatial structure as much as possible, the MPFR model adopts a spiral search strategy centered on the initial pixel position to find the nearest free pixel in the neighborhood.
[0071] Specifically, let This indicates the currently occupied set of pixels, and then for features that are in conflict... Its final pixel position is determined by solving the following problem:
[0072]
[0073] in, This represents the Euclidean distance between pixels. This shows the final pixel position assigned to the j-th traffic statistics feature after relocation. This represents the initial pixel position of the j-th traffic statistics feature obtained in step S3. This indicates the position of a candidate pixel in the pixel grid. Indicates that when the condition is met The pixel position that minimizes the distance function is selected from the pixel positions specified in the conditions. In this embodiment, the distance function can be defined using Euclidean distance:
[0074]
[0075] This strategy ensures that high mutual information features are preferentially preserved at their original location or its neighboring locations, which essentially means using the initial pixel location... A spiral search strategy is used to progressively search for free pixels outwards from the center. When the nearest unoccupied pixel is found, that position is used as the new pixel coordinate. And add that position to the set. The aforementioned feature relocation mechanism ensures that each traffic statistics feature corresponds to a unique pixel location, thereby avoiding feature overwriting and information loss.
[0076] S6: Two-Dimensional Image Construction. After assigning pixel positions to all traffic statistics features, establish the mapping relationship between the traffic statistics features and the pixels of the two-dimensional image, and construct the corresponding two-dimensional image representation. This mapping can be formalized as a mapping function:
[0077]
[0078]
[0079] in Indicates the number of traffic statistics features. Representing the spatial resolution of a two-dimensional image, the mapping function It is an injective mapping, ensuring that each feature corresponds to a unique pixel position;
[0080] Based on mapping function Let the first The feature vector of each network traffic sample is:
[0081]
[0082] in, This represents the normalized value of the i-th network traffic sample on the j-th traffic statistical feature.
[0083] The two-dimensional image is then represented as Specifically defined as:
[0084]
[0085] in, This indicates the pixel coordinates of the two-dimensional image corresponding to the i-th network traffic sample. The pixel value at that location.
[0086] Under the above construction method, each traffic statistical feature assigned to a pixel location corresponds to a unique pixel location, and its pixel value reflects the normalized value of the traffic statistical feature in the current network traffic sample, thereby forming a two-dimensional traffic image representation with a fixed spatial structure, which can be used as a unified input representation for subsequent convolutional neural network classification modules and data augmentation modules.
[0087] S7: ACGAN Data Augmentation. Building upon the image representation constructed using MPFR encoding, this paper further introduces an Auxiliary Classifier Generative Adversarial Network (ACGAN) as a data augmentation module to supplement minority class attack samples in the training set and alleviate class imbalance. It is important to emphasize that ACGAN is not a necessary component for improving the performance of the MPFR model, but rather an optional augmentation module used to further enhance minority class discrimination capabilities, provided the encoding space is stable and semantically consistent.
[0088] ACGAN introduces class label constraints into traditional generative adversarial networks, enabling the generator to explicitly consider class information while generating samples. Let the generator be... The discriminator is The random noise vector is Category label is Under category constraints, the generator produces synthetic samples:
[0089]
[0090] in This represents the generated traffic image sample.
[0091] Discriminator It outputs two types of results: one is the probability of judging whether the sample is true or false. ,in Second, category prediction probability. The training objective of ACGAN consists of adversarial loss and auxiliary classification loss, with the adversarial loss defined as:
[0092]
[0093] The auxiliary classification loss is defined as:
[0094]
[0095] During training, the discriminator maximizes... Simultaneously improve the ability to distinguish between genuine and fake products and differentiate categories; the generator, on the other hand, maximizes... This method generates synthetic samples that conform to a specified category feature distribution while deceiving the discriminator. The optimization objective is to ensure that the generated samples not only approximate the real data distribution but also maintain consistency with the specified labels in category semantics, thus making them more suitable for targeted sample supplementation in imbalanced scenarios.
[0096] In summary, by training the generator and discriminator adversarially, the generator can generate two-dimensional traffic images that are similar to the distribution of real samples, thereby supplementing the number of minority class samples and improving the class distribution of the training data.
[0097] After training, the generated two-dimensional image samples of a few categories will be... Compared with the original two-dimensional image sample Together they form the expanded training dataset
[0098] S8: Classification Module. After completing MPFR image encoding and optional data augmentation, this paper employs a two-dimensional convolutional neural network (2D-CNN) to classify the constructed traffic image samples, effectively distinguishing between normal traffic and attack traffic. The design of this classification module follows the principles of simple structure, moderate number of parameters, and stable training, aiming to highlight the representational advantages brought by MPFR encoding, rather than relying on complex classifier structures for performance improvements.
[0099] Specifically, the classification module adopts a lightweight 2D-CNN structure, and its input is a vector array of size [size missing]. Single-channel flow image, in which The network consists of a multi-layered two-dimensional convolutional feature extraction module and a fully connected classification module. The convolutional layers extract spatial features from the image layer by layer through the local receptive field, and the pooling layers are used to compress the feature scale and enhance the translation invariance of the model. After the spatial feature extraction is completed, the feature map is flattened and input into the fully connected layer, and finally the predicted probability of each category is output through the Softmax function.
[0100] The experimental procedure of this invention is as follows:
[0101] Experimental environment
[0102] The experimental environment for this invention is Windows 10 operating system, CPU is Intel i7-10875H, GPU is RTX2060 (6GB), 16GB memory, Python 3.8 programming language, and model construction is completed in PyTorch 1.10 environment.
[0103] The training process uses a focus loss function to enhance attention to the minority class, employs Adam as the optimizer, sets the model learning rate to 0.001, the training batch size to 32, and the training epochs to 15.
[0104] Evaluation indicators
[0105] This invention uses the following evaluation metrics: accuracy ( ), recall rate RC), accuracy ( Recall (PR) value. Recall focuses more on not missing positive cases. Precision, on the other hand, focuses more on reducing false positives. The value represents a comprehensive consideration of precision and recall; a higher value indicates better classification performance. The calculation method for these indicators is shown in the following formula (16-19):
[0106] (16)
[0107] (17)
[0108] (18)
[0109] (19)
[0110] in, It is the number of samples correctly classified as C. This is the number of samples correctly classified as Not-C. It is the number of samples misclassified as C. This is the number of samples misclassified as Not-C.
[0111] Experimental Results and Analysis
[0112] Table 1 Comparison with existing methods
[0113]
[0114] (1) Overall performance comparison
[0115] As shown in Table 1, different image-based modeling methods all achieve high classification performance on the CICIDS2017 dataset, indicating that mapping network traffic statistical features to two-dimensional images and using convolutional neural networks for modeling is feasible. Among them, the MPFR method proposed in this paper achieves the best overall performance, significantly outperforming the comparison methods in both Accuracy (0.9836) and F1 score (0.9595), verifying the effectiveness of the proposed method in attack traffic detection tasks.
[0116] (2) Impact analysis of improved coding methods (without GAN)
[0117] Without introducing GAN data augmentation, a comparison of 2D-CNN, MAGNETO, and MPFR-gan reveals that as the feature-pixel mapping strategy is gradually optimized, the model performance shows a continuous upward trend.
[0118] Specifically, compared to simple two-dimensional image methods (2D-CNN), MAGNETO improves both precision and F1 score by introducing t-SNE to construct the feature space layout. On this basis, MPFR-gan further resolves feature conflicts through a mutual information-guided feature relocation mechanism, improving its precision to 0.9109 and its F1 score to 0.9192, which are significantly better than the previous two methods.
[0119] The results show that, without relying on GAN, the proposed MPFR encoding method can significantly improve the model's ability to distinguish attack traffic. The performance improvement mainly comes from the encoding strategy's full preservation of feature discrimination information.
[0120] (3) Analysis of the role of GAN data augmentation
[0121] Under the condition of fixed encoding method, the impact of introducing GAN data augmentation varies among different methods. For the MAGNETO method, after introducing GAN, its F1 score dropped from 0.9107 to 0.8922, and the precision also decreased, indicating that in the original MAGNETO encoding space, GAN-generated samples may introduce noise and fail to effectively improve the model's detection performance for attack traffic.
[0122] In contrast, introducing GANs within the MPFR encoding framework further improves model performance. The MPFR method outperforms MPFR-gan in accuracy, precision, and F1 score, especially maintaining a high level in precision and F1. This indicates that the MPFR encoding method can provide a more stable and discriminative feature space for GAN data augmentation, making the generated samples more conducive to training classification models.
[0123] (4) Comprehensive analysis of Precision and Recall
[0124] From the perspective of recall, all methods exhibit high recall values on the CICIDS2017 dataset, indicating that the models as a whole possess strong attack traffic detection capabilities. However, significant differences exist in the precision metric among the different methods. The 2D-CNN and MAGNETO methods have relatively low precision, suggesting that they suffer from some false positives while detecting attack traffic.
[0125] In contrast, the MPFR method maintains a high recall (0.9717) while increasing precision to 0.9477, achieving a more ideal balance between precision and recall. This result further illustrates that the feature relocation mechanism that preserves mutual information helps reduce the false alarm rate, thereby improving the practical usability of intrusion detection systems.
[0126] Comprehensive analysis shows that the MPFR method proposed in this paper achieves the best classification performance on the CICIDS2017 dataset. Experimental results verify that: 1) the feature-pixel mapping strategy has a significant impact on the performance of image-based traffic classification; 2) the mutual information-guided feature relocation mechanism can effectively improve the discriminative ability in the encoding stage; 3) introducing GAN data augmentation on the basis of MPFR encoding can further improve model performance, but the encoding method itself is still the main source of performance improvement.
[0127] Table 2 Comparison of classification performance under different data augmentation strategies
[0128]
[0129] Table 2 shows the impact of different data augmentation strategies on model performance under the MPFR coding framework. MPFR (SMOTE) represents the use of the SMOTE method to oversample minority class samples, while MPFR represents the complete method of data augmentation using GAN.
[0130] The overall results show that both methods achieved high classification performance on the CICIDS2017 dataset, indicating that introducing data augmentation strategies based on MPFR encoding can effectively alleviate the class imbalance problem. However, comparing the specific metrics of the two methods reveals that the MPFR method outperforms MPFR(SMOTE) in Accuracy, Precision, and F1 score.
[0131] Specifically, while MPFR (SMOTE) achieved a high recall (0.9975), indicating its strong detection capability against attack traffic, its relatively low precision (0.8991) suggests that oversampling introduced a certain number of redundant or noisy samples, leading to an increased false positive rate. In contrast, the MPFR method, while maintaining a high recall (0.9717), improved precision to 0.9477, resulting in an F1 score of 0.9595, achieving a more balanced performance between precision and recall.
[0132] The results show that, compared with traditional oversampling methods based on interpolation, GAN data augmentation can better characterize the distribution features of minority class samples and generate more discriminative synthetic samples. Introducing GAN data augmentation into the stable feature space provided by MPFR encoding helps to further improve the overall classification performance of the model.
[0133] Example 2: To achieve the above objective, based on Example 1, as shown in the figure, this invention discloses a feature relocation image coding method for network traffic classification, including:
[0134] Based on the above method embodiments, the present invention also provides a feature relocation image coding system for network traffic classification, including a data processing module; a two-dimensional feature embedding module; a grid mapping and collision detection module; a mutual information evaluation and ranking module; a feature relocation module; an image construction module; and a classification module. Each of the above modules is used to execute the corresponding functional steps in the foregoing embodiments. Each module can be implemented in software, hardware, or a combination of software and hardware.
[0135] The technical means disclosed in this invention are not limited to those disclosed in the above embodiments, but also include technical solutions composed of any combination of the above technical features.
Claims
1. A feature relocation image coding method for network traffic classification, characterized in that, Includes the following steps: S1, acquire network traffic sample data, perform data cleaning and numerical normalization on the network traffic samples, and construct a feature matrix containing M traffic statistical features, where each row represents a sample feature vector and each column represents a traffic statistical feature. S2, using the value vector of each traffic statistical feature in all samples as the embedding object, the traffic statistical feature vector is mapped to a two-dimensional continuous space using a dimensionality reduction algorithm to obtain the corresponding two-dimensional embedding coordinates. S3, map the coordinates in the two-dimensional continuous space to a discrete pixel grid of preset resolution to obtain the initial pixel position of each flow statistics feature, and detect whether there is a conflict in which multiple flow statistics features are mapped to the same pixel position. S4, calculate the mutual information between each traffic statistics feature and the network traffic category label, and sort the traffic statistics features according to their importance based on the size of the mutual information to obtain a priority sequence of traffic statistics features; S5. Pixel positions are assigned to traffic statistics features in sequence according to the priority sequence. When the target pixel position is an idle pixel, the corresponding traffic statistics feature is mapped to that pixel. When the target pixel position is occupied, a spiral search is performed with that pixel position as the center to determine the nearest idle pixel position and complete the relocation, thereby constructing a one-to-one mapping relationship between traffic statistics features and pixels. S6. Based on the one-to-one mapping relationship, the sample feature vector of each network traffic sample is encoded into a two-dimensional image, where the pixel value is the normalized value of the traffic statistical feature in the current sample. S7. Based on the obtained two-dimensional image, an auxiliary classification generative adversarial network is introduced to generate synthetic image samples of a specified category, so as to supplement the sample data of a few categories. S8 takes the enhanced 2D image as input to the convolutional network classification model and outputs the corresponding network traffic category.
2. The feature relocation image coding method for network traffic classification according to claim 1, characterized in that, The numerical normalization process adopts the minimum-maximum normalization method, which linearly scales the value of each flow statistical feature in all samples. The calculation formula is shown in equation (1): (1); in This represents the original value of the j-th traffic statistical feature in the i-th network traffic sample. , Let $\begin{p}$ and $\begin{p}$ represent the maximum and minimum values of the $j$-th flow statistical feature in all samples, respectively. The value is the normalized value.
3. The feature relocation image coding method for network traffic classification according to claim 1, characterized in that, In S2, if the original dataset contains a total of Each sample contains network traffic samples. If there are several traffic statistics features, then the normalized feature matrix is: ; Each row represents a sample feature vector, and each column represents a traffic statistics feature. By transposing the characteristic matrix, we obtain A set consisting of 1 eigenvectors: (3); in, Indicates the first The vector of values for each flow statistical feature across all samples. This represents the value of the i-th network traffic sample in the j-th traffic statistical feature; For each flow statistics feature vector, perform a two-dimensional embedding mapping to obtain two-dimensional embedding coordinates: ; in, Represents the statistical eigenvector The two-dimensional coordinate points obtained after two-dimensional embedding and These represent the coordinate components of the two-dimensional coordinate point in the x-axis and y-axis directions, respectively.
4. The feature relocation image coding method for network traffic classification according to claim 1, characterized in that, In S3, the image resolution is set to... ,in This represents the number of rows in the pixel grid divided by the image height. Let the pixel grid number be divided by the image width. Then, the initial pixel position of the j-th traffic statistics feature coordinate is: ; ; in Indicates the first The initial pixel position of each traffic statistics feature. Indicates rounding down. Each represents all The set of coordinate components of a feature in the x-axis and y-axis directions.
5. The feature relocation image coding method for network traffic classification according to claim 1, characterized in that, In S4, mutual information is introduced as a measure of the importance of traffic statistics features. Flow statistics feature vector With category labels Their mutual information is defined as: ; in, This represents the mutual information value between the j-th traffic statistics feature and the category label. This indicates the possible values for the category label. This represents the joint probability distribution of traffic statistics feature values and category labels. This represents the marginal probability distribution of the values of traffic statistics features. Represents the marginal probability distribution of category labels; For all The mutual information value of each flow statistical feature is calculated, and the features are sorted from largest to smallest to obtain a priority sequence of flow statistical features: ; in This indicates the sorting results of the traffic statistics feature index. This indicates a descending order of arrangement; in subsequent pixel allocation, traffic statistics features with higher mutual information will preferentially occupy their initial mapping positions.
6. The feature relocation image coding method for network traffic classification according to claim 1, characterized in that, Specifically, S5 is set as follows This indicates the currently occupied set of pixels, and then for features that are in conflict... Its final pixel position is determined by solving the following problem: ; in, This represents the Euclidean distance between pixels. This shows the final pixel position assigned to the j-th traffic statistics feature after relocation. This represents the initial pixel position of the j-th traffic statistics feature obtained in step S3. Indicates the position of candidate pixels in the pixel grid. Indicates that when the condition is met The pixel position that minimizes the distance function is selected from the pixel positions given the conditions; the distance function is defined using Euclidean distance: 。 7. The feature relocation image coding method for network traffic classification according to claim 1, characterized in that: S6 defines a mapping function from flow statistics features to pixel locations: ; ; in Indicates the number of traffic statistics features. Representing the spatial resolution of a two-dimensional image, the mapping function It is an injective mapping, ensuring that each feature corresponds to a unique pixel position; Based on mapping function Let the first The feature vector of each network traffic sample is: ; in, This represents the normalized value of the i-th network traffic sample on the j-th traffic statistical feature. The two-dimensional image is then represented as Specifically defined as: ; in, This indicates the pixel coordinates of the two-dimensional image corresponding to the i-th network traffic sample. The pixel value at the location; under the above construction method, each traffic statistics feature assigned to a pixel location corresponds to a unique pixel location, and its pixel value reflects the normalized value of the traffic statistics feature in the current network traffic sample.