A semi-supervised contrast learning-based encrypted traffic classification method and system, and a storage medium

By employing a semi-supervised contrastive learning method, combined with bi-branch feature extraction and the Mamba module, the scalability and robustness issues of encrypted traffic classification methods under unlabeled data and obfuscated traffic are addressed, achieving efficient and low-overhead encrypted traffic classification.

CN121881007BActive Publication Date: 2026-06-19HARBIN INSTITUTE OF TECHNOLOGY (SHENZHEN) (INSTITUTE OF SCIENCE AND TECHNOLOGY INNOVATION HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN)

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
HARBIN INSTITUTE OF TECHNOLOGY (SHENZHEN) (INSTITUTE OF SCIENCE AND TECHNOLOGY INNOVATION HARBIN INSTITUTE OF TECHNOLOGY SHENZHEN)
Filing Date
2026-03-19
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing encrypted traffic classification methods lack scalability when dealing with unlabeled data and are not robust enough when dealing with obfuscated traffic, making it difficult to cope with dynamic changes in the network environment and adversarial attacks.

Method used

We employ a semi-supervised contrastive learning approach, combining traffic embedding and enhancement, multi-granularity feature extraction, and semi-supervised contrastive learning with a bi-branch feature extractor and Mamba module. By utilizing cross-attention mechanism and FixMatch loss function, we generate high-confidence pseudo-labels, thereby improving the model's scalability and robustness.

Benefits of technology

It significantly reduces annotation costs, improves the model's discriminative ability and robustness in real network environments, and can effectively cope with dynamic network changes and attack obfuscation.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121881007B_ABST
    Figure CN121881007B_ABST
Patent Text Reader

Abstract

This invention relates to a method, system, and storage medium for classifying encrypted traffic based on semi-supervised contrastive learning. The method includes: Step 1: After stream partitioning the original encrypted traffic, aligning the headers and payloads of data packets in the network flow, and generating enhanced headers and payloads according to an enhancement strategy; Step 2: Multi-granularity feature extraction: A dual-branch feature extractor is used to process the enhanced headers and payloads separately, and a fused flow-level representation is obtained using a cross-attention mechanism, followed by deep feature extraction using Mamba; Step 3: Semi-supervised contrastive learning: A fine-grained classifier for encrypted traffic is trained by combining the contrastive learning loss and cross-entropy loss for labeled data, and the FixMatch loss for unlabeled data. The beneficial effects of this invention are: through the feature structure constraints of contrastive learning and the high-confidence threshold filtering mechanism in FixMatch, noise propagation is jointly suppressed, training stability is improved, and labeling costs are significantly reduced.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of computer technology, and in particular to a method, system, and storage medium for classifying encrypted traffic based on semi-supervised contrastive learning. Background Technology

[0002] Traffic classification, a fundamental capability for network management and security, can identify specific services, applications, malware, and network attacks, contributing to improved network service quality and user experience, and supporting tasks such as malware identification and intrusion detection. Currently, with the steady improvement of users' cybersecurity awareness and the continuous refinement of domestic and international cybersecurity laws and regulations, network traffic encryption technology is widely used in network communications. While encrypted communication technology effectively protects user privacy and data security, it also facilitates the concealment of malicious traffic. For example, attackers can obfuscate their malicious activities to evade detection by security devices. Therefore, in the modern network environment, learning effective and robust encrypted traffic representations is crucial for traffic classification. Existing classification methods utilize machine learning models to learn the complex business patterns generated by various applications and protocols in the modern network environment, achieving high-accuracy encrypted traffic classification. However, existing methods still face two key challenges: scalability and robustness.

[0003] (1) Poor scalability: Machine learning training often relies heavily on large-scale labeled data. Re-collecting traffic and labels is costly in the face of constantly emerging new data. Therefore, methods based on pre-training and semi-supervised learning have been proposed to leverage abundant unlabeled traffic data.

[0004] (2) Insufficient robustness: In actual deployment, classification models not only face natural changes in the network environment (such as protocol updates and background traffic interference), but are also more susceptible to adversarial evasion attacks. Attackers can systematically perturb packet characteristics by actively obfuscating and disguising malicious traffic, thereby distorting the decision boundaries that the model has learned and causing it to fail.

[0005] Therefore, current classification methods for encrypted traffic suffer from insufficient scalability when faced with unlabeled data and poor robustness when faced with obfuscated traffic. Specifically, existing scalable classification methods are often limited by computational overhead and interference from noisy labels. Existing highly robust classification methods struggle to extract robust features with generalization capabilities from real-world dynamic network traffic. Summary of the Invention

[0006] To address the above technical problems, this invention provides a method for classifying encrypted traffic based on semi-supervised contrastive learning. This method includes the following steps:

[0007] Step 1, Traffic Embedding and Enhancement: Divide the original encrypted traffic into streams and align the header and payload of the data packets in each stream to achieve dual independent encoding of the two; design traffic enhancement strategies to perturb, mask, or rearrange the data packet header and payload respectively to generate enhanced samples;

[0008] Step 2, multi-granularity feature extraction: A dual-branch feature extractor is used to process the augmented header and augmented load separately, and the fused flow level representation is obtained by using the cross-attention mechanism. Finally, the flow features are extracted by Mamba.

[0009] Step 3, Semi-supervised contrastive learning: Combine the contrastive learning loss and cross-entropy loss for labeled data with the FixMatch loss for unlabeled data to train a fine-grained classifier for encrypted traffic.

[0010] As a further improvement of the present invention, step 2 further includes:

[0011] Step 20: First, obtain byte-level representation through byte embedding and positional encoding, and then use depthwise separable convolution to perform local aggregation to form packet-level representation;

[0012] Step 21: Semantic alignment and fine-grained interaction between the header and payload are achieved through cross-attention mechanism. After fusion, a flow-level representation is obtained through residual connection. Then, the Mamba module is used to perform global context modeling and extract the final traffic features with long-range dependencies.

[0013] As a further improvement of the present invention, step 3 further includes:

[0014] Step 30: For labeled samples, supervised contrastive learning is used to make samples of the same class close to each other in the projection space and samples of different classes far apart, thereby learning feature representations with high discriminative power.

[0015] Step 31: For unlabeled samples, the FixMatch semi-supervised learning framework is used to generate high-confidence pseudo-labels based on weak augmented views and force the model to maintain prediction consistency on strong augmented views, thereby effectively utilizing unlabeled data and suppressing pseudo-label noise.

[0016] Step 32: Jointly optimize the contrastive learning loss and cross-entropy loss for labeled samples, and the consistency regularization loss for unlabeled samples, and train the classifier together.

[0017] As a further improvement of the present invention, step 1 further includes:

[0018] Step 10: Divide the raw traffic into independent sessions based on the source IP address, destination IP address, source port, destination port, and communication protocol. Each session contains a series of ordered data packets, denoted as... For each data packet It is divided into head and load And perform structured alignment processing;

[0019] Step 11: For the header, extract the network layer IPv4 header and the transport layer TCP / UDP header information, represented as follows: For the payload, the length of each data packet payload is fixed to [a certain value]. If the load length exceeds If so, then cut off; otherwise, use filling;

[0020] Step 12: Represent each session as an aligned header sequence and load sequence Take the first N data packets in the session as its vector representation; if there are insufficient packets, use the vector representation. Finally, all bytes are mapped to integers in [0, 255] as the final session vector representation.

[0021] As a further improvement of the present invention, in step 11, the IPv4 header It includes a fixed field of 20 bytes and an optional field of up to 40 bytes, all aligned to 60 bytes. Any shortfall will be filled with... Padding, TCP header Aligned to 60 bytes, UDP header Fixed at 8 bytes; each It is uniformly represented as a 128-byte vector; the MAC address, IP address and port number fields are set to 0.

[0022] As a further improvement of the present invention, in step 1, the traffic enhancement strategy uses the original header sequence With load sequence As input, the output is the enhanced head sequence. With load sequence Specifically, the header and payload of each data packet in the session are processed iteratively, and the following three types of enhancement strategies are applied randomly with probability:

[0023] Perturbation strategy: based on probability Randomly modify the header fields and insert random bytes into the payload to simulate network noise or packet injection attacks;

[0024] Masking strategy: based on probability Masking of random fields in the header and randomly discarding some payload data is used to simulate hidden protocol fields or partial loss of payload data.

[0025] Rearrangement strategy: based on probability The order of selected fields in the header is randomly swapped, and the payload bytes are rearranged to simulate field disorder or byte-level reassembly attacks.

[0026] As a further improvement of the present invention, step 20 further includes:

[0027] Step 201: Use a dual-branch feature extractor to extract features from the header and payload data respectively, and generate byte-level and packet-level structured representations step by step;

[0028] Step 202: For byte-level representation, embed each byte in the header and payload into... A 3D vector space is used, and a sinusoidal positional encoding is introduced to preserve the relative position information of bytes in the sequence. Specifically, for bytes in the header... Its embedding vector is denoted as The location code is denoted as The two are added element by element to obtain the final byte-level representation. :

[0029] ;

[0030] Step 203: Use depthwise separable convolutions for byte-level representation Local feature aggregation is performed to generate a data packet-level representation, as follows:

[0031] use Depth convolution of kernels Each data packet's input channel is processed independently to extract local features; then... Pointwise convolution Achieve cross-channel aggregation to obtain local block embedding. :

[0032]

[0033] Where N is the number of data packets; further, Unfold along the spatial dimension and transform through linear projection to The 3D space forms the final data packet-level feature representation. The packet level in the header is represented as The packet level of the payload is represented as ;

[0034] Step 21 also includes:

[0035] Step 211: Achieve semantic alignment and fine-grained interaction between the two modalities through a cross-modal cross-attention mechanism, and generate a fused stream-level representation; in this attention, the head representation is... as query vector Load representation as a key vector AND value vector Calculate the attention output:

[0036]

[0037] Where c is the output channel of the convolutional layer;

[0038] Step 212: Output Attention Compared with the original head representation The data is spliced ​​together and residually connected to the original spliced ​​representations of the header and payload. After layer normalization, the fused level representation is obtained. :

[0039] .

[0040] Step 213: Use standard Mamba convection level representation Global dependency modeling is performed, and the Mamba module selectively propagates relevant state information with linear complexity, achieving efficient compression of long-range contexts. Based on the output of the Mamba module, global average pooling is performed along the sequence dimension to obtain the final traffic features. :

[0041] .

[0042] As a further improvement of the present invention, step 30 further includes: for labeled samples, supervised contrastive learning is employed to enhance the intra-class compactness and inter-class separability of the feature space; specifically, the original features are... and enhanced features The input is fed into a projection head consisting of two fully connected layers and a ReLU activation function, which maps it to the contrastive embedding space to obtain the corresponding projected features. :

[0043]

[0044] Supervised contrastive loss is used within the projected feature space. Bring similar samples closer together and push away dissimilar samples, for each anchor sample The positive sample set consists of samples of the same type. The remaining samples constitute the negative sample set. First, the cosine similarity between features is calculated, and then a temperature parameter is introduced. Adjustments are needed:

[0045]

[0046] Contrast loss function Defined as:

[0047]

[0048] in This represents the number of labeled samples in the current batch.

[0049] Step 31 specifically involves:

[0050] For unlabeled samples, a consistency regularization method based on FixMatch is used to regularize the original features. and enhanced features The predicted probability distributions are obtained by inputting the data into the classifier. and The classifier structure is similar to that of the projector head, with a Softmax layer at the end for selection. Medium confidence level above the threshold The category is used as a pseudo-label. And calculate its correlation with strong enhancement prediction. The cross-entropy loss is used to constrain the model to produce consistent predictions for different views of the same unlabeled sample. The FixMatch loss is defined as follows:

[0051]

[0052] in, For unlabeled sample sizes, The indicator function is used to calculate the loss only for high-confidence pseudo-labeled samples;

[0053] Original features of labeled samples Apply standard cross-entropy loss:

[0054]

[0055] in, The number of labeled samples, The prediction is obtained from the original features, and y is the label;

[0056] Step 32 specifically involves:

[0057] The model is trained by jointly optimizing the contrastive loss, the labeled sample cross-entropy loss, and the FixMatch consistency loss. The total loss function is:

[0058]

[0059] in, and Hyperparameters used to balance the contributions of various losses.

[0060] The present invention also discloses an encrypted traffic classification system based on semi-supervised contrastive learning, comprising: a memory, a processor, and a computer program stored in the memory, wherein the computer program is configured to implement the steps of the method described in the present invention when invoked by the processor.

[0061] The present invention also discloses a computer-readable storage medium storing a computer program configured to implement the steps of the method described in the present invention when invoked by a processor.

[0062] The beneficial effects of this invention are: 1. To address the problem of relying on large-scale labeled data and poor scalability, it integrates contrastive learning and the FixMatch semi-supervised framework. Through the feature structure constraints of contrastive learning and the high-confidence threshold filtering mechanism in FixMatch, noise propagation is jointly suppressed, training stability is improved, and only a small number of labeled samples are needed to jointly utilize a large amount of unlabeled data, significantly reducing labeling costs; 2. To address the problems of insufficient model robustness and high training overhead, a traffic enhancement strategy that closely resembles the dynamics of real networks is designed to simulate real-world changes such as network jitter and attack obfuscation. Features are extracted from these scenarios through a low-overhead multi-granularity feature extraction network, which significantly reduces computational and memory overhead while ensuring discriminative ability, thereby improving the robustness of the model in the real world. Attached Figure Description

[0063] Figure 1 This is a flowchart of an encrypted traffic classification method based on semi-supervised contrastive learning according to the present invention.

[0064] Figure 2 This is a schematic diagram of the multi-granularity feature extraction method of the present invention. Detailed Implementation

[0065] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments.

[0066] This invention discloses a semi-supervised contrastive learning-based encrypted traffic classification method. It constructs robust training data by simulating real-world network dynamics and employs a dual-branch approach with a Mamba module to extract multi-granularity cross-modal features, significantly enhancing the model's robustness against traffic obfuscation and dynamic environmental changes. The combined use of contrastive learning and consistency regularization techniques enables semi-supervised training, efficiently utilizing unlabeled data while generating high-confidence pseudo-labels to resist noise, fundamentally improving the model's scalability. This semi-supervised contrastive learning-based encrypted traffic classification method is primarily applied in the field of computer technology, particularly in artificial intelligence technologies such as knowledge graphs.

[0067] This invention proposes an encrypted traffic classification method based on semi-supervised contrastive learning. By dual encoding and enhancing network traffic, multi-granularity cross-modal fusion features are extracted, and the classification model is trained by combining contrastive learning and semi-supervised learning.

[0068] The main steps of this invention include:

[0069] Step 1: Traffic Embedding and Enhancement; Step 1 implements traffic embedding and enhancement to avoid semantic confusion in network packets and enhance traffic diversity. The process includes:

[0070] (1) Divide the original encrypted traffic into streams and align the header and payload of the data packets in each stream to achieve dual independent encoding of the two;

[0071] (2) Design network dynamic change and attack obfuscation strategies, and perform perturbation, masking or rearrangement operations on the packet header and payload respectively to generate enhanced samples.

[0072] Step 2: Multi-granularity feature extraction; In step 2, a dual-branch extraction and attention mechanism are used to progressively extract and fuse features of different granularities, and Mamba is used to capture global features (i.e., the final traffic features). The implementation process is as follows: Figure 2 As shown. The specific process includes:

[0073] Step 20: Use a dual-branch feature extractor to process the header and payload separately: First, obtain byte-level representation through byte embedding and positional encoding; then, use depthwise separable convolution to perform local aggregation to form a data packet-level representation;

[0074] Step 21: Semantic alignment and fine-grained interaction between the header and payload are achieved through cross-attention mechanism. After fusion, a stream-level representation is obtained through residual connection. Finally, the Mamba module is used to perform global context modeling and extract global features with long-range dependencies.

[0075] Step 3: Semi-supervised contrastive learning; In step 3, contrastive learning and pseudo-label-based consistency regularization are used together to collaboratively optimize the classifier on labeled and unlabeled data. The process includes:

[0076] Step 30: For labeled samples, supervised contrastive learning is used to make samples of the same class close to each other in the projection space and samples of different classes far apart, thereby learning feature representations with high discriminative power.

[0077] Step 31: For unlabeled samples, the FixMatch semi-supervised learning framework is used to generate high-confidence pseudo-labels based on weak augmented views and force the model to maintain prediction consistency on strong augmented views, thereby effectively utilizing unlabeled data and suppressing pseudo-label noise.

[0078] Step 32: Finally, jointly optimize the contrastive loss and cross-entropy loss of labeled samples, as well as the consistency regularization loss of unlabeled samples, and train the classifier together to achieve a synergistic improvement in the model's scalability and robustness.

[0079] The implementation process of the encrypted traffic classification technology proposed in this invention is as follows: Figure 1 As shown. Next, the implementation process of the present invention will be described in detail.

[0080] Step 1: Traffic embedding and enhancement.

[0081] Step 10: First, divide the raw traffic into independent sessions based on the source IP address, destination IP address, source port, destination port, and communication protocol. Each session contains a series of ordered packets, denoted as... For each data packet It can be divided into head and load And perform structured alignment processing.

[0082] Step 11: For the header, mainly extract the network layer IPv4 header and the transport layer TCP / UDP header information, represented as follows: IPv4 header It includes a fixed field of 20 bytes and an optional field of up to 40 bytes, all aligned to 60 bytes. Any shortfall will be filled with... Padding. TCP header The structure is similar, also aligned to 60 bytes. UDP header Fixed at 8 bytes. Ultimately, each It is uniformly represented as a 128-byte vector. Furthermore, to prevent overfitting due to information leakage, sensitive fields such as MAC address, IP address, and port number are set to 0.

[0083] Step 12: For the payload, fix the payload length of each data packet to [value missing]. If the load length exceeds If so, then cut off; otherwise, use Padding. After the above processing, each session can be represented as an aligned header sequence. and load sequence Take the first N packets in the session as its vector representation; if insufficient, use... Padding. Finally, all bytes are mapped to integers in [0, 255] as the final session vector representation.

[0084] Then, to improve the model's adaptability to dynamic network environments and adversarial perturbations, a traffic augmentation strategy was designed and implemented. This strategy uses the original header sequence... With load sequence As input, the enhanced head sequence is output. With load sequence Specifically, the header and payload of each data packet in the session are processed iteratively, and the following three types of enhancement strategies are applied randomly with probability:

[0085] (1) Perturbation strategy: using probability Randomly modify the header fields and insert random bytes into the payload to simulate network noise or packet injection attacks;

[0086] (2) Masking strategy: based on probability Masking of random fields in the header and randomly discarding some payload data is used to simulate hidden protocol fields or partial loss of payload data.

[0087] (3) Rearrangement strategy: based on probability The order of selected fields in the header is randomly swapped, and the payload bytes are rearranged to simulate field disorder or byte-level reassembly attacks.

[0088] Step 2: Multi-granularity feature extraction;

[0089] Step 20 includes:

[0090] Step 201: First, a dual-branch feature extractor is used to extract features from the header and payload data respectively, generating byte-level and data packet-level structured representations step by step.

[0091] Step 202: For byte-level representation, embed each byte in the header and payload into... A 3D vector space is used, and a sinusoidal positional encoding is introduced to preserve the relative position information of bytes in the sequence. Specifically, for bytes in the header... Its embedding vector is denoted as The location code is denoted as The two are added element by element to obtain the final byte-level representation. :

[0092]

[0093] Step 203: To reduce computational complexity, depthwise separable convolutions are used for byte-level representation. Local feature aggregation is performed to generate a data packet-level representation. First, using... Depth convolution of kernels Each data packet's input channel is processed independently to extract local features; then... Pointwise convolution Achieve cross-channel aggregation to obtain local block embedding. :

[0094]

[0095] Where N is the number of data packets. Furthermore, Unfold along the spatial dimension and transform through linear projection to The 3D space forms the final data packet-level feature representation. The packet level in the header is represented as The packet level of the payload is represented as .

[0096] Step 21 also includes:

[0097] Step 211: After obtaining the packet-level representations of the header and payload, a cross-modal cross-attention mechanism is used to achieve semantic alignment and fine-grained interaction between the two modalities, generating a fused stream-level representation. In this attention mechanism, the header representation is... as query vector Load representation as a key vector AND value vector Calculate the attention output:

[0098]

[0099] Here, c represents the output channel of the convolutional layer. The attention mechanism enables the head features to adaptively focus on semantically relevant parts of the payload, thereby effectively capturing streaming-level dependencies.

[0100] Furthermore, output attention Compared with the original head representation The data is spliced ​​together and residually connected to the original spliced ​​representations of the header and payload. After layer normalization, the fused level representation is obtained. :

[0101] .

[0102] Step 213: Finally, use the standard Mamba convection level representation. Global dependency modeling is performed. The Mamba module selectively propagates relevant state information with linear complexity, achieving efficient compression of long-range contexts. Based on the output of the Mamba module, global average pooling is performed along the sequence dimension to obtain the final traffic features. :

[0103] .

[0104] Step 3: Semi-supervised comparative learning.

[0105] By following steps 1 and 2, the flow depth characteristics corresponding to the original head and payload can be obtained. (referred to as original features) ), and enhance the flow depth features corresponding to the head and payload. (Abbreviated as enhanced features) Based on this, comparative learning and semi-supervised consistency learning methods are used for joint training of labeled and unlabeled data, respectively, to improve the model's performance in feature discriminativeness and pseudo-label robustness.

[0106] For labeled samples, supervised contrastive learning is employed to enhance the intra-class compactness and inter-class separability of the feature space. Specifically, the original features are... and enhanced features The input is fed into a projection head consisting of two fully connected layers and a ReLU activation function, which maps it to the contrastive embedding space to obtain the corresponding projected features. :

[0107]

[0108] Supervised contrastive loss is used within the projected feature space. This is used to bring similar samples closer together and push away dissimilar samples. For each anchor sample... The positive sample set consists of samples of the same type. The remaining samples constitute the negative sample set. First, the cosine similarity between features is calculated, and then a temperature parameter is introduced. Adjustments are needed:

[0109]

[0110] Contrast loss function Defined as:

[0111]

[0112] in This represents the number of labeled samples in the current batch.

[0113] For unlabeled samples, a consistency regularization method based on FixMatch is used. The original features... and enhanced features The predicted probability distributions are obtained by inputting the data into the classifier. and The classifier structure is similar to the projector head, with a Softmax layer at the end. (Selection) Medium confidence level above the threshold The category is used as a pseudo-label. And calculate its correlation with strong enhancement prediction. The cross-entropy loss is used to constrain the model to produce consistent predictions for different views of the same unlabeled sample. The FixMatch loss is defined as follows:

[0114]

[0115] in, For unlabeled sample sizes, The indicator function calculates the loss only for high-confidence pseudo-labeled samples.

[0116] In addition, to enhance the discriminative power of classification boundaries, the original features of the labeled samples are... Apply standard cross-entropy loss:

[0117]

[0118] in, The number of labeled samples, y represents the prediction obtained from the original features, and y represents the label.

[0119] Finally, the model is trained by jointly optimizing the contrastive loss, labeled sample cross-entropy loss, and FixMatch consistency loss. The total loss function is:

[0120]

[0121] in, and To balance the contributions of various loss parameters, this joint optimization mechanism enables the model to effectively utilize unlabeled samples to improve generalization ability and robustness while simultaneously using labeled samples for discriminative learning.

[0122] The present invention also discloses an encrypted traffic classification system based on semi-supervised contrastive learning, comprising: a memory, a processor, and a computer program stored in the memory, wherein the computer program is configured to implement the steps of the method described in the present invention when invoked by the processor.

[0123] The present invention also discloses a computer-readable storage medium storing a computer program configured to implement the steps of the method described in the present invention when invoked by a processor.

[0124] This invention deeply couples FixMatch-based semi-supervised learning and supervised contrastive learning to construct a mutually reinforcing training framework. It utilizes the structured, discriminative feature space formed by contrastive learning to guide and improve the quality and stability of pseudo-labels. Compared to existing technologies, this invention overcomes the performance bottleneck caused by the accumulation of pseudo-label noise.

[0125] This invention proposes a multi-granularity feature extraction network that accurately extracts encrypted traffic features with strong discriminative power and high robustness from obfuscated traffic, achieving a balance between performance and efficiency. Compared with existing technologies, this invention more comprehensively considers the semantic and hierarchical characteristics of network traffic, while having lower computational overhead, higher robustness, and extremely high practical deployment value.

[0126] The above description, in conjunction with specific preferred embodiments, provides a further detailed explanation of the present invention. It should not be construed that the specific implementation of the present invention is limited to these descriptions. For those skilled in the art, various simple deductions or substitutions can be made without departing from the concept of the present invention, and all such modifications and substitutions should be considered within the scope of protection of the present invention.

Claims

1. A method for classifying encrypted traffic based on semi-supervised contrastive learning, characterized in that, The method includes the following steps: Step 1, Traffic Embedding and Enhancement: Divide the original encrypted traffic into streams and align the header and payload of the data packets in each stream to achieve dual independent encoding of the two; design traffic enhancement strategies to perturb, mask, or rearrange the data packet header and payload respectively to generate enhanced samples; Step 2, multi-granularity feature extraction: A dual-branch feature extractor is used to process the augmented header and augmented load separately, and the fused flow level representation is obtained by using the cross-attention mechanism. Finally, the flow features are extracted by Mamba. Step 3, Semi-supervised contrastive learning: Combine the contrastive learning loss and cross-entropy loss of labeled data with the FixMatch loss of unlabeled data to train a fine-grained classifier for encrypted traffic; Step 2 also includes: Step 201: Use a dual-branch feature extractor to extract features from the header and payload data respectively, and generate byte-level and packet-level structured representations step by step; Step 202: For byte-level representation, embed each byte in the header and payload into... A 3D vector space is used, and a sinusoidal positional encoding is introduced to preserve the relative position information of bytes in the sequence. Specifically, for bytes in the header... Its embedding vector is denoted as The location code is denoted as The two are added element by element to obtain the final byte-level representation. : ; Step 203: Use depthwise separable convolutions for byte-level representation Local feature aggregation is performed to generate a data packet-level representation, as follows: use Depth convolution of kernels Each data packet's input channel is processed independently to extract local features; then... Pointwise convolution Achieve cross-channel aggregation to obtain local block embedding. : Where N is the number of data packets; further, Unfold along the spatial dimension and transform through linear projection to The 3D space forms the final data packet-level feature representation. The packet level in the header is represented as The packet level of the payload is represented as ; Step 204: Achieve semantic alignment and fine-grained interaction between the two modalities through a cross-modal attention mechanism, and generate a fused stream-level representation; in this attention mechanism, as query vector , as a key vector AND value vector Calculate the attention output: Where c is the output channel of the convolutional layer; Step 205: Output Attention Packet-level representation compared to the raw header Concatenate and combine with the original header's packet-level representation and packet-level representation of the raw payload The concatenated streams are then residually joined and subjected to layer normalization to obtain the fused stream level representation. : ; Step 206: Use standard Mamba convection level representation Global dependency modeling is performed, and the Mamba module selectively propagates relevant state information with linear complexity, achieving efficient compression of long-range contexts. Based on the output of the Mamba module, global average pooling is performed along the sequence dimension to obtain the final traffic features. : 。 2. The encrypted traffic classification method according to claim 1, characterized in that, Step 3 also includes: Step 30: For labeled samples, supervised contrastive learning is used to make samples of the same class close to each other in the projection space and samples of different classes far apart, thereby learning feature representations with high discriminative power. Step 31: For unlabeled samples, the FixMatch semi-supervised learning framework is used to generate high-confidence pseudo-labels based on weak augmented views and force the model to maintain prediction consistency on strong augmented views, thereby effectively utilizing unlabeled data and suppressing pseudo-label noise. Step 32: Jointly optimize the contrastive learning loss and cross-entropy loss for labeled samples, and the consistency regularization loss for unlabeled samples, and train the classifier together.

3. The encrypted traffic classification method according to claim 1, characterized in that, Step 1 also includes: Step 10: Divide the raw traffic into independent sessions based on the source IP address, destination IP address, source port, destination port, and communication protocol. Each session contains a series of ordered data packets, denoted as... For each data packet It is divided into head and load And perform structured alignment processing; Step 11: For the header, extract the network layer IPv4 header and the transport layer TCP / UDP header information, represented as follows: For the payload, the length of each data packet payload is fixed to [a certain value]. If the load length exceeds If so, then cut off; otherwise, use filling; Step 12: Represent each session as an aligned header sequence and load sequence Take the first N data packets in the session as its vector representation; if there are insufficient packets, use the vector representation. Finally, all bytes are mapped to integers in [0, 255] as the final session vector representation.

4. The encrypted traffic classification method according to claim 3, characterized in that, In step 11, the IPv4 header It includes a fixed field of 20 bytes and an optional field of up to 40 bytes, all aligned to 60 bytes. Any shortfall will be filled with... Padding, TCP header Aligned to 60 bytes, UDP header Fixed at 8 bytes; each It is uniformly represented as a 128-byte vector; the MAC address, IP address and port number fields are set to 0.

5. The encrypted traffic classification method according to claim 3, characterized in that, In step 1, the traffic enhancement strategy uses the original header sequence. With load sequence As input, the output is the enhanced head sequence. With load sequence Specifically, the header and payload of each data packet in the session are processed iteratively, and the following three types of enhancement strategies are applied randomly with probability: Perturbation strategy: based on probability Randomly modify the header fields and insert random bytes into the payload to simulate network noise or packet injection attacks; Masking strategy: based on probability Masking of random fields in the header and randomly discarding some payload data is used to simulate hidden protocol fields or partial loss of payload data. Rearrangement strategy: based on probability The order of selected fields in the header is randomly swapped, and the payload bytes are rearranged to simulate field disorder or byte-level reassembly attacks.

6. The encrypted traffic classification method according to claim 2, characterized in that, Step 30 further includes: for labeled samples, employing supervised contrastive learning to enhance the intra-class compactness and inter-class separability of the feature space; specifically, modifying the original features... and enhanced features The input is fed into a projection head consisting of two fully connected layers and a ReLU activation function, which maps it to the contrastive embedding space to obtain the corresponding projected features. : Supervised contrastive loss is used within the projected feature space. Bring similar samples closer together and push away dissimilar samples, for each anchor sample The positive sample set consists of samples of the same type. The remaining samples constitute the negative sample set. First, the cosine similarity between features is calculated, and then a temperature parameter is introduced. Adjustments are needed: Contrast loss function Defined as: in This represents the number of labeled samples in the current batch. Step 31 specifically involves: For unlabeled samples, a consistency regularization method based on FixMatch is used to regularize the original features. and enhanced features The predicted probability distributions are obtained by inputting the data into the classifier. and The classifier structure is similar to that of the projector head, with a Softmax layer at the end for selection. Medium confidence level above the threshold The category is used as a pseudo-label. And calculate its correlation with strong enhancement prediction. The cross-entropy loss is used to constrain the model to produce consistent predictions for different views of the same unlabeled sample. The FixMatch loss is defined as follows: in, For unlabeled sample sizes, The indicator function is used to calculate the loss only for high-confidence pseudo-labeled samples; Original features of labeled samples Apply standard cross-entropy loss: in, The number of labeled samples, The prediction is obtained from the original features, and y is the label; Step 32 specifically involves: The model is trained by jointly optimizing the contrastive loss, the labeled sample cross-entropy loss, and the FixMatch consistency loss. The total loss function is: in, and Hyperparameters used to balance the contributions of various losses.

7. An encrypted traffic classification system based on semi-supervised contrastive learning, characterized in that, include: A memory, a processor, and a computer program stored on the memory, the computer program being configured to implement the steps of the encrypted traffic classification method according to any one of claims 1-6 when invoked by the processor.

8. A computer-readable storage medium, characterized in that: The computer-readable storage medium stores a computer program configured to implement the steps of the encrypted traffic classification method according to any one of claims 1-6 when invoked by a processor.