Malicious encrypted traffic detection method based on unsupervised domain-incremental learning
By constructing a category geometry through Mamba feature extraction and bidirectional symmetric contrastive loss, and combining a diagonal covariance prototype and a drift-aware update mechanism, the unsupervised domain incremental learning problem in malicious encrypted traffic detection is solved, achieving efficient malicious traffic detection and a low forgetting rate.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- NANJING TECH UNIV
- Filing Date
- 2026-04-11
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies struggle to achieve continuous domain incremental learning in malicious encrypted traffic detection under conditions of no annotations and no historical data. They face problems such as insufficient knowledge representation, imbalance between prototype update responsiveness and stability, and coupling forgetting between features and decision space, leading to performance degradation of the model in distributed offset scenarios.
Mamba is used as the feature extraction backbone. A compact category geometry is constructed through bidirectional symmetric contrastive loss. Diagonal covariance prototypes are used as lightweight knowledge carriers. Combined with a drift-aware prototype update mechanism and a two-layer anchored anti-forgetting mechanism, unsupervised incremental learning in the domain is achieved.
In a constantly changing network environment, it achieves high accuracy and low forgetting rate in detecting malicious encrypted traffic, optimizing the trade-off between performance and efficiency, with an average accuracy improvement of 14% and an average forgetting rate reduction to 3.31%.
Smart Images

Figure CN122247727A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of network security technology, specifically a method for detecting malicious encrypted traffic based on unsupervised domain incremental learning that applies artificial intelligence technology to network security technology. Background Technology
[0002] The payload content of malicious traffic encrypted by protocols such as Transport Layer Security (TLS) is no longer visible to the detection system, forcing intrusion detection to shift towards modeling traffic metadata and statistical characteristics [1]. Studies have shown that deep learning modeling methods have achieved high recognition accuracy in encrypted traffic detection under static or single-type network conditions [2] [3]. However, after a long-distance end-to-end TLS stream is transmitted through Ethernet, Wi-Fi or 5G networks, its packet length distribution, arrival interval and jitter characteristics will change significantly [4]. The resulting distribution shift makes it difficult for detection models trained in specific environments to be directly generalized to new deployment scenarios [5].
[0003] Domain Adaptation (DA) transfers learned knowledge from a source domain to a target domain with different distributions[6], providing a feasible path to alleviate the aforementioned distribution shift. However, DA essentially deals with a one-time migration problem between two fixed domains, while drift in real-world networks is not a one-time event. For example, a remote worker may experience multiple switching from an office wired network to home Wi-Fi, and then to a 5G mobile network during their commute. In addition, an Intrusion Detection System (IDS) deployed at headquarters needs to face the differences in network infrastructure in different regions when its business expands to overseas branches. Each change in the network environment constitutes a new distribution shift. In this process, the model not only needs to adapt to the current distribution changes in the domain, but also needs to maintain adaptability as the deployment environment continues to evolve[7]. This requirement constitutes the Domain-Incremental Learning (DIL) problem[8].
[0004] In real-world encrypted traffic detection, continuous DIL faces progressively increasing real-world constraints. Unlike traditional DIL where each new domain is equipped with labeled data [9], network flows arriving in new domains usually do not have any labeled information. Therefore, the model must adapt to the distribution of new domains under completely unsupervised conditions. On this basis, the problem is further constrained by the unrecoverability of historical data. Currently, the main means of suppressing forgetting is based on replay methods, storing historical samples for replay during training on new tasks
[10] . However, the continuous generation of network flows puts significant storage pressure and privacy compliance risks on long-term storage of historical data
[11] , making it difficult to revisit the original data of historical domains after domain switching. The above constraints require the model to simultaneously achieve plasticity adaptation to new domains and stability maintenance of historical domains under conditions where there is neither labeled guidance nor historical data available for review. In addition to learning paradigm constraints, feature modeling of encrypted traffic also has an important impact on detection performance
[12] . The discriminative information of encrypted traffic is implicit in structures such as packet length sequence and temporal interval, so effectively capturing this long-range temporal dependency is the key to feature extraction
[13] . Intrusion detection requires the model to adapt efficiently online when each new domain arrives, which imposes strict limitations on inference latency and computational overhead
[14] .
[0005] While the DIL provides a systematic framework for addressing continuous domain drift, its application to malicious encrypted traffic intrusion detection still faces several key challenges, including knowledge representation, catastrophic forgetting suppression, and efficient sequence modeling.
[0006] 1) Knowledge construction under unlabeled constraints. Supervised DIL can use labeled signals to construct classification boundaries
[15] , while unlabeled models need to rely on the knowledge representation constructed by the source domain as the only carrier for cross-domain transmission
[16] . DroidEvolver++
[17] and ADAPT
[18] inherit the source domain classifier to generate pseudo-labels for unlabeled traffic. However, this fails to make full use of the opposing structural information between normal and malicious categories. On the other hand, MCAKE
[19] uses self-supervised contrastive learning to construct normal prototypes, and then measures the deviation of samples from the prototypes to achieve anomaly detection. However, applying unidirectional constraints only with the normal class as the anchor point, the lack of direct constraints on the intra-class geometry of the abnormal class will lead to the loss of representativeness of the abnormal class prototype. AOC-IDS
[20] constructs normal and malicious prototypes based on one-dimensional Gaussian distribution estimation for statistical decision. However, one-dimensional Gaussian treats each feature dimension as independent, which may lose the correlation information between dimensions. The complete covariance matrix can preserve the correlation between dimensions, but when the feature dimension is The storage and computation overhead are respectively as high as and .
[0007] 2) Imbalance between responsiveness and stability in prototype updates. After prototype knowledge is constructed, the model needs to track distribution changes to continuously update the prototype when the network environment changes
[21] . Since the target domain has no labeled information, prototype updates can only rely on the decision results of the current prototype structure on the samples as implicit supervision. Given that the decision results themselves have inherent noise, failure to screen the decision results will cause cumulative pollution to the prototype
[22] . In addition, there are significant differences in the degree of domain drift reflected by different samples. Samples that fall within the existing distribution range indicate that the current prototype still has good representation ability, while samples far from the distribution center may indicate that the local distribution has shifted
[23] . Existing methods (such as PASS
[24] , CoPE
[25] ) generally adopt fixed mean and momentum update strategies, which make it difficult to achieve a balance between adequately responding to drift and stable updates.
[0008] 3) Coupling forgetting between features and decision space. Forgetting in the DIL process stems from the drift of model parameters and knowledge representation. The former causes the feature representation of historical domain samples to deviate from its original position; the latter causes the decision boundary of the historical domain to be eroded. The two forgettings are coupled: feature shift will invalidate the established decision structure, and the drift of the decision structure will in turn mislead the encoder's update direction. Parameter regularization methods (such as EWC
[26] and SI
[27] ) penalize important parameters in the historical task to limit their variation. In addition, Li et al.
[28] used the soft output of the old model as a distillation signal to constrain the deviation of the new model from historical knowledge. SSF[7] and MalCL
[29] maintain a memory buffer to replay historical samples to suppress model parameter shift. The above methods do not provide isolation for the structural integrity of knowledge representation in the decision space.
[0009] Terminology Explanation: In cross-domain knowledge transfer, a "prototype" is a central representation of a category or domain, typically aggregated from the mean or core features of samples within that category or domain. It characterizes the shared structure between the source and target domains, helping to mitigate cross-domain distribution differences. In cross-domain learning, prototypes serve as a basis for pseudo-label generation and a reference point for learning decision boundaries, facilitating effective knowledge transfer from the source to the target domain. Summary of the Invention
[0010] To address the aforementioned technical issues, this invention proposes a malicious encrypted traffic detection method based on unsupervised domain incremental learning, applicable to the detection of malicious encrypted traffic in a continuously evolving real-world environment.
[0011] The malicious encrypted traffic detection method of the present invention uses a traffic detection model to detect normal traffic and malicious traffic from the input encrypted traffic.
[0012] In scenarios involving changes in the network environment, the traffic detection model uses a prototype structure as a carrier for cross-domain knowledge transfer and learns within the unsupervised domain incremental learning (DIL) framework.
[0013] The flow detection model uses Mamba as the feature extraction backbone.
[0014] In the initial training phase of the model in the source domain, the normal class and the malicious class are driven to form a compact and separate geometric structure in the latent space through bidirectional symmetric contrastive loss (BSC), and the diagonal covariance prototype is used as a lightweight cross-domain category knowledge carrier.
[0015] During the target domain continuous adaptation phase after the model is deployed to the target domain, a drift-aware prototype update mechanism based on standardized Mahalanobis distance is adopted, combined with decision confidence screening and dynamic momentum adjustment.
[0016] The anti-forgetting method in the learning process adopts a collaborative anti-forgetting mechanism with two layers of anchoring from encoder parameters and decision space.
[0017] The main technical contributions of this invention include:
[0018] (1) A contrastive learning method based on bidirectional symmetry drives intra-class aggregation and inter-class separation of normal and malicious classes through a bidirectional anchoring mechanism. Simultaneously, diagonal covariance is used... Costs are introduced to replace the complete covariance matrix in constructing category prototypes, providing a lightweight and discriminative knowledge carrier for online prototype updates during the domain increment phase.
[0019] (2) The drift-aware adaptive prototype update mechanism removes uncertain samples near the decision boundary to suppress noise contamination and dynamically adjusts the update step size based on the standardized Mahalanobis distance. For extreme outlier samples that exceed the confidence boundary, the mechanism truncates them to protect the stability of the prototype, enabling the prototype to keep reliably updated while effectively tracking distribution drift.
[0020] (3) The collaborative anti-forgetting mechanism of features and decisions uses knowledge distillation at the feature layer to constrain the update magnitude of encoder parameters to prevent the degradation of feature extraction capability; and introduces prototype anchoring regularization at the decision layer to maintain the geometric stability of the decision space. This collaboratively suppresses knowledge drift from the feature extraction and decision layers. Attached Figure Description
[0021] Figure 1 This represents a framework for source domain prototype construction and target domain adaptive prototype update.
[0022] Figure 2 This section compares the BSC loss, InfoNCE loss, and CRC loss.
[0023] Figure 3This diagram illustrates the combined anti-forgetting mechanism of encoder distillation and prototype anchoring.
[0024] Figures 4(a) to 4(e) Figure 4(a) represents the sustained DIL performance matrix of each method, where: Figure 4(a) is Proposed-1, Figure 4(b) is SSF, Figure 4(c) is AOC-IDS, Figure 4(d) is EWC, and Figure 4(e) is LwF;
[0025] Figure 5 Indicates a comparison of average forgetting rates;
[0026] Figure 6 This indicates a comparison of the average accuracy of DIL (Digital Integrity Level).
[0027] Figures 7(a) to 7(f) The following figures illustrate the t-SNE visualization of the feature space during different contrastive losses and domain increment adaptation processes: Figure 7(a) is the Source domain (BSC Loss), Figure 7(b) is After Replay-1, Figure 7(c) is After Replay-2, Figure 7(d) is After Replay-3, Figure 7(e) is the InfoNCE Loss, and Figure 7(f) is the CRC Loss.
[0028] Figure 8 This section compares the number of parameters and adaptation time for each method.
[0029] Figure 9 This represents a trade-off analysis between the detection performance and inference speed of each method. Detailed Implementation
[0030] The present invention will be further described below with reference to the accompanying drawings and specific embodiments.
[0031] 1 Overview
[0032] As employees frequently switch between heterogeneous access environments such as wired networks, Wi-Fi, and 5G, the statistical characteristics of encrypted traffic are constantly shifting in distribution. Constrained by privacy protection and the cost of manual annotation, new domain data lacks annotation information, and historical domain data is difficult to store and revisit long-term. To address these challenges, this invention proposes an unsupervised domain incremental learning method for malicious encrypted traffic detection.
[0033] This method uses Mamba as the feature extraction backbone to capture long-range temporal dependencies of traffic sequences with linear complexity.
[0034] In the source domain stage, a highly discriminative feature geometry is established through bidirectional symmetric contrast loss, and a diagonal covariance prototype is used as a lightweight cross-domain knowledge carrier.
[0035] During the continuous adaptation phase in the target domain, a drift-aware prototype update mechanism based on standardized Mahalanobis distance is proposed, which is combined with decision confidence screening and dynamic momentum adjustment to ensure that the prototype is protected from the accumulation of noise samples while tracking distribution changes.
[0036] A two-layer anchored collaborative anti-forgetting mechanism is constructed from encoder parameters and decision space.
[0037] On a real network environment dataset covering four countries and three access methods, the method achieved an average accuracy of 88.50%, which is more than 14% higher than the best baseline, with an average forgetting rate of only 3.31%, achieving the optimal trade-off between performance and efficiency.
[0038] 2. Scheme Design
[0039] Suppose that the network environment change scenario is abstracted as follows:
[0040] The network environment is a collection of , No. The dataset in each environment is ,in For traffic samples eigenvectors, The total number of samples. Source domain. The medium-sized samples have complete benign / malicious labels. The target domains that are subsequently reached The label information is completely unavailable. There are significant distribution differences between different domains, i.e., for... ,have This distribution shift is the root cause of the degradation in cross-domain detection performance. Indicates the first An environment; and They represent In the The and the first Prototype vectors in each environment.
[0041] This section uses prototype structures as the carrier for cross-domain knowledge transfer, constructing a DIL framework from three levels: source domain prototype construction, target domain adaptive updating, and anti-forgetting. Figure 1 As shown.
[0042] 2.1 Source Domain Prototype Construction
[0043] The goal of the source domain training phase is to drive the encoder to form a discriminative class geometry in the latent space. Normal and malicious traffic prototypes are explicitly initialized, providing a high-quality, structured starting point for subsequent continuous adaptation to the unlabeled target domain.
[0044] 1) Bidirectional symmetric contrastive representation learning. Let the number of normal samples and the number of malicious samples in a batch be respectively... and The latent space feature vector sets of the two classes of samples are denoted as follows: and The similarity between samples is measured using cosine similarity. The inter-class similarity between normal and abnormal samples within a batch is calculated as follows:
[0045]
[0046] in This refers to temperature hyperparameters.
[0047] like Figure 2 As shown, the InfoNCE loss
[30] constructs positive sample pairs with an enhanced view of a single sample, while the remaining samples in the batch are treated as negative samples without discrimination and inevitably exclude some samples of the same class incorrectly. Similarly, the CRC loss
[20] applies a one-way constraint only with the normal class as the anchor point, while the lack of direct constraint on the intra-class geometry of the abnormal class leads to impaired prototype representativeness. In this regard, unlike the CRC loss of the normal class, this section designs a novel bidirectional symmetric contrastive (BSC) loss, which uses two anchor points to constrain intra-class clustering and inter-class segregation. Define the normal class anchor point loss
[0048]
[0049] in This represents a traffic sample that is distinct from i;
[0050] Due to the symmetry of cosine similarity, we know , The anchor loss can be shared between normal and abnormal classes. Therefore, the anchor loss for the abnormal class is defined.
[0051]
[0052] Corresponding to the loss during source domain training, the BSC loss consists of anomaly and normal class anchor point losses, i.e.
[0053]
[0054] 2) Prototype Distribution Modeling. After convergence of source domain training, the encoder has formed a discriminative class geometry in the latent space. Let the set of normal samples and malicious samples in the source domain be... and The normal traffic prototype and the abnormal traffic prototype are initialized to
[0055]
[0056] However, the prototype is essentially a point in the feature space, and this spherical boundary ignores the differences in the dispersion of similar samples across various feature dimensions. Normal traffic and malicious traffic feature distributions are usually not isotropic
[23] , but exhibit large random fluctuations in other dimensions. If a uniform spherical boundary is used for judgment, a large number of misjudgments will inevitably occur in the dispersed dimensions. Therefore, using the prototype vector as the sole representation of the category has obvious limitations.
[0057] To address the aforementioned issues, this invention further models the distribution range of each category in the feature space based on the prototype vector. (Sample) The direction and magnitude of deviation from the class center in each feature dimension are denoted as . By statistically analyzing the difference vectors of all samples of the same type, the distribution pattern of the category in the high-dimensional feature space can be naturally depicted, and it can be accurately described by the complete covariance matrix, corresponding to an ellipsoid centered on the prototype
[31] . However, after training in the source domain, the feature space has shown an obvious intra-class clustering structure. The intra-class differences are mainly reflected in the amplitude changes in each dimension, while the correlation between dimensions is relatively weak. Therefore, each feature dimension can be set to be independent of each other, that is, the off-diagonal elements in the covariance matrix can be ignored. This diagonal approximation can be established without significantly losing the ability to describe the distribution
[32] . To reduce the computational complexity, diagonal covariance is introduced to approximate the spatial distribution of each category. The diagonal covariance of the malicious traffic category can be calculated as follows:
[0058]
[0059] in This represents element-wise product. Similarly, the diagonal covariance vector of the normal flow category is... Diagonal approximation reduces storage overhead from... Down to In the decision-making process, weighted Euclidean distance is used instead of Mahalanobis distance, and the computational complexity remains the same. .
[0060] 2.2 Sample Decision and Prototype Update in the Target Domain
[0061] After initializing the source domain prototype structure, the model is adapted sequentially to the unlabeled target domain sequence. Under the constraint of having no historical data to review, the quality of the prototype structure update determines the model's detection capability in new domains. To address the dual threats of noisy decision-making and heterogeneous drift, this invention proposes an adaptive prototype update mechanism based on sample-level drift awareness.
[0062] 1) Sample decision and selection based on Mahalanobis distance. Since the prototype structure maintains the diagonal covariance of each dimension, Mahalanobis distance is used as the distance metric between the sample and the prototype
[31] . Let the nth dimension in the target domain be the first dimension of the prototype. The feature vector of each arriving sample is , and The Mahalanobis distance is calculated as
[0063]
[0064] in , , The corresponding vectors are the first Each component. Mahalanobis distance allows the distance metric to adaptively reflect the degree of intra-class dispersion across each dimension. Similarly, and The Mahalanobis distance is Accordingly, the judgment result of the sample is formalized into the following binary variables.
[0065]
[0066] in This indicates that the sample is identified as malicious traffic, otherwise it is considered normal traffic.
[0067] The reliability of the decision result directly determines whether it can be used to drive prototype updates. Samples near the decision boundary have similar Mahalanobis distances to the two types of prototypes, resulting in high uncertainty in the decision result. If this is forcibly used to drive prototype updates, incorrectly classified samples will continuously contaminate the prototype. Therefore, the absolute value of the difference between the sample's and the two types of prototypes' Mahalanobis distances is used to measure the decision confidence.
[0068]
[0069] Given a decision confidence threshold Only when High-confidence samples will be included in subsequent prototype update processes; while low-confidence samples will only retain their decision results for detection output and will not participate in prototype calibration.
[0070] 2) Drift-aware prototype update. Although the categories of samples after decision confidence screening are clear, their contribution to prototype update still needs to be adjusted with finer granularity. The degree of deviation from existing prototypes is characterized by the minimum Mahalanobis distance, i.e.
[0071]
[0072] The current prototype has a good characterization ability for this sample, and the prototype should remain stable; conversely, the higher the degree of drift, the faster the prototype should respond. However, if for all... Larger samples are updated with larger step sizes, and extreme outliers can contaminate the prototype. Therefore, this invention introduces a cutoff threshold based on standardized Mahalanobis distance. Note that... The expected value varies with the feature dimension Linear growth is difficult to standardize. To eliminate coupling with dimensions, Standardized to It approximately follows a standard normal distribution, and the cutoff threshold is taken at a corresponding confidence level. Corresponding standard normal quantile Therefore, the momentum coefficient of drift perception is formalized as
[0073]
[0074] in This is the drift sensitivity hyperparameter. It is only valid if the sample simultaneously satisfies... and hour, The update step size decreases monotonically with the degree of distribution deviation; the more significant the drift, the larger the update step size. Conversely, the sample is judged as an extreme sample that exceeds the confidence boundary.
[0075] For samples meeting the above dual-screening criteria, the prototype update only applies to the category to which the sample is classified; the other prototype remains unaffected. Assuming the current sample is classified as the normal category, the corresponding... individual domains Updated to
[0076]
[0077] Synchronized and updated to
[0078]
[0079] The covariance and prototype updates share the same momentum coefficient. This is to ensure that the estimated distribution range and the calibration of the distribution center are consistent in terms of update magnitude.
[0080] 2.3 Collaborative Anti-Forgetting Mechanism of Dual Anchor Points
[0081] Encoder parameters shift under the continuous influence of the target domain feature distribution, causing the feature representations of historical domain samples to deviate from their original positions. Furthermore, the prototype continuously drifts as it adapts to the new domain distribution, eroding the decision boundaries of the historical domain. This process carries the risk of failing to acquire new knowledge in a timely manner and forgetting key information. This invention applies constraints at both the encoder parameter and prototype structure levels, constructing a two-layer anchored anti-forgetting mechanism, the structure of which is as follows: Figure 3 As shown.
[0082] 1) Encoder Knowledge Distillation. During domain switching, the encoder parameters from the previous domain adaptation are frozen and saved as the teacher model, while the encoder in the current domain adaptation is used as the student model. The teacher and student model parameters are... and For the first Samples reaching the target domain Distillation loss is
[0083]
[0084] When each sample arrives, Equation (14) continues to constrain... Do not deviate This is to ensure that the feature representation of historical domain samples does not deviate excessively under the current encoder. During domain switching, The encoder state is synchronously updated to reflect the previous domain adaptation. This process does not require access to historical domain data.
[0085] 2) Prototype structure anchoring. Although constraint (14) ensures the stability of feature extraction during the DIL process, it cannot directly protect the structural integrity of the decision space. Therefore, proceeding to the... Before each target domain, the previous domain prototype is frozen and saved as a historical anchor point. And apply prototype structure anchoring constraints to the encoder throughout the current domain adaptation process. and They represent the first The historical anchor prototype of each domain. Let... The judgment result is in the normal category, and the prototype anchoring regularization term is:
[0086]
[0087] Equation (15) makes the sample features generated by the current encoder continuously approach the historical anchor prototype of the decision, which effectively prevents the decision space from collapsing due to the drastic drift of the feature space when the encoder adapts to the new domain.
[0088] For the corresponding encoder and prototype structure constraints, the joint constraint loss composed of equations (14) and (15) is:
[0089]
[0090] in These are hyperparameters. The encoder minimizes [the parameter] when a sample arrives. Update parameters.
[0091] Algorithm 1 provides the complete execution flow of the proposed method.
[0092]
[0093] 3. Experimental Preparation
[0094] 3.1 Experimental Data Acquisition and Processing
[0095] To simulate the cross-regional and cross-access method detection scenarios of multinational enterprise IDS, this experiment constructed a heterogeneous real-world environment dataset. This dataset was obtained by replaying TLS encrypted traffic from CIRA-CIC-DoHBrw-2020 between senders and receivers in four different countries. It covers three access methods: Wi-Fi, 5G, and Ethernet, and contains 15,000 malicious and 15,000 benign TLS flows (see Table 1 for details). Replay-0, as the fully labeled source domain, was used for initial model training, while Replay-1 through Replay-3, as completely unlabeled target domains, participated in continuous adaptation sequentially. The training and test sets for each domain were divided in an 8:2 ratio, with the test set remaining constant throughout the adaptation process to evaluate the model's ability to maintain detection capabilities across historical domains.
[0096] Table 1 Real-world traffic replay configuration
[0097]
[0098] In terms of traffic representation, the first five IP packets of each TLS flow are extracted sequentially, with the header and payload normalized to 80 bytes and 240 bytes respectively. If the length is insufficient, it is padded with zeros; if it exceeds the limit, it is truncated. Each flow is ultimately reshaped into a 40×40 two-dimensional matrix, where each 8 rows correspond to a packet. The first two rows encode the header information, and the last six rows encode the payload content. This matrix is then divided into a fixed-size patch sequence, with each patch aggregating local bytes and protocol structure information. The Mamba encoder models along the patch sequence.
[0099] 3.2 Comparison Method
[0100] To comprehensively evaluate the effectiveness of continuous learning in the context of continuous intrusion detection (DIL), this experiment selects comparison methods from two dimensions: the general domain of continuous learning (CL) and the domain of intrusion detection.
[0101] Baseline methods for general domain CL. As a representative of regularized CL methods, EWC
[26] uses the Fisher information matrix to estimate the importance of each parameter to the historical task and applies elastic penalties to important parameters to suppress catastrophic forgetting. LwF
[28] uses the soft output of the old model as the distillation target when adapting to the new domain, maintaining the performance on the historical task without accessing the historical data.
[0102] Baseline methods for intrusion detection. As a representative of semi-supervised CL, SSF[7] uses the KS test to perceive the distribution drift of data and combines strategic sample selection and memory replay mechanism to achieve continuous domain adaptation. As a representative of unsupervised CL, AOC-IDS
[20] constructs normal class prototypes based on contrastive representation learning and completes statistical decision by fitting a double Gaussian distribution offline.
[0103] Ablation variants of the proposed method. To quantitatively verify the independent contribution of each module, the following ablation variants were designed: removing the outlier anchor loss from the BSC loss (w / o ), Remove decision filtering mechanism (without filtering), Remove prototype structure anchor points (without ), Remove encoder knowledge distillation (w / o) ) and simultaneously remove the double-layer anti-forgetting constraint (w / o ).
[0104] 3.3 Evaluation Indicators
[0105] Detection performance metrics. Accuracy, precision, recall, and F1 score are used to evaluate the model's single-domain detection capability in each target domain, to comprehensively measure the model's accuracy and coverage in identifying malicious traffic.
[0106] CL performance metrics. Single-domain detection metrics cannot comprehensively evaluate the model's overall ability to accumulate and retain knowledge during sequence domain learning, so two CL-specific metrics are introduced.
[0107] Average accuracy (AA) is defined as...
[0108]
[0109] in This indicates that the model continues to learn up to the [number]. Accuracy on the test set of each domain. AA reflects the model's real-time learning performance in each domain throughout the entire CL process.
[0110] Average Forgetting (AF) is defined as
[0111]
[0112] in This indicates that the model continues learning until the last domain, then returns to the test domain. The accuracy obtained in each domain. The smaller the AF, the stronger the model's ability to retain knowledge.
[0113] 3.4 Experimental Parameters
[0114] The values of each hyperparameter are shown in Table 2. To verify the effectiveness of the method design itself and its independence from the backbone network, the proposed framework was instantiated using three encoders: Mamba
[33] , Transformer
[34] , and MLP. As shown in Table 3, Proposed-1 is the complete method configuration of this invention, and Proposed-3 maintains the same encoder type as SSF and AOC-IDS. As a general CL baseline, EWC and LwF use MLP as the backbone network and perform CL directly on the one-dimensional traffic characteristics.
[0115] Table 2 Default Parameter Settings
[0116]
[0117] Table 3 Backbone networks of encoders using different methods
[0118]
[0119] 4 Results Analysis
[0120] 4.1 CL Performance Analysis
[0121] To comprehensively evaluate the detection performance and knowledge preservation capabilities of each method in the domain increment process, Figures 4(a) to 4(e) The performance matrix of each method is presented in the form of a heatmap, where each row corresponds to a test domain and each column corresponds to an adaptation stage. The darker the color, the higher the detection performance. Figure 5 and Figure 6 AA and AF are quantified separately, and Tables 4 to 6 provide supplementary comparisons from three dimensions: precision, F1, and recall.
[0122] In Figures 4(b)–4(d), SSF, AOC-IDS, EWC, and LwF all approach 50% randomness in the three target domains, while Proposed-1 significantly outperforms all the comparison methods. These results indicate that the bidirectional symmetric prototype structure constructed using BSC loss possesses stronger cross-domain discriminative generalization.
[0123] Both EWC and LwF experienced catastrophic failures during the first domain switch, causing the source domain accuracy to plummet to approximately 49%. In AF, forgetting during the first domain switch was the dominant factor. SSF's source domain performance was relatively stable, but forgetting accumulated continuously during the first two domain switches, reflecting a gradual decline in the representativeness of its replay mechanism under unlabeled conditions. AOC-IDS exhibited different forgetting patterns, mainly concentrated in the Replay1→Replay2 stage. At this point, the accumulated error intensified after multiple domain shifts due to the offline fixed Gaussian parameters, resulting in severe drift of the decision boundary. Furthermore, AOC-IDS achieved a recall of 98.95% on Replay-1, but its precision was only 65.27%, further confirming the inaccuracy of the fixed decision mechanism after domain shifts.
[0124] Proposed-1 ranked first in accuracy (AA), improving upon the second-best method by more than 14%; its accuracy (AF) was only 3.31%, with a balanced forgetting distribution across all stages and no sudden degradation. In Replay-2 and Replay-3, its precision, F1 score, and recall all exceeded 87%, validating the synergistic effectiveness of drift-aware prototype updates and the dual-anchoring anti-forgetting mechanism.
[0125] Table 4 Accuracy Comparison During DIL Process
[0126]
[0127] Table 5. F1 comparison during DIL process
[0128]
[0129] Table 6. Comparison of recall rates during the DIL process.
[0130]
[0131] 4.2 Feature Space Visualization Analysis
[0132] To visually demonstrate the discriminative nature of BSC loss and the feature evolution during the DIL process, Figures 7(a) to 7(f) use t-SNE
[35] to visualize the latent space features at each stage.
[0133] Figure 7(a) shows the feature distribution of the source domain after training with BSC loss. The normal and malicious classes form two compact and separate clusters in the feature space with clear inter-class boundaries and only a very small number of overlapping boundary samples. This indicates that the dual-anchor mechanism effectively drives intra-class clustering and inter-class separation, providing a high-quality structured starting point for subsequent prototype initialization. Figures 7(e) and 7(f) show the feature distribution of the source domain after training with InfoNCE and CRC losses. Compared to BSC loss, under InfoNCE loss, each class splits into multiple separate sub-clusters, and there are significant overlapping and mixed regions between classes. This verifies the defect of incorrectly rejecting similar samples as negative samples. The normal class maintains a similar compactness to BSC under CRC loss, but the malicious class exhibits a looser structure. This verifies the necessity of introducing anomaly class anchor constraints in BSC loss.
[0134] Figures 7(b) to 7(d) show the feature evolution of Proposed-1 as it adapts sequentially to the three unlabeled target domains. In Figures 7(a) to 7(b), both clustering types exhibit a certain degree of diffusion, with a significant increase in mixed samples at inter-class boundaries, corresponding to a substantial decrease in Replay-1 detection accuracy. Figures 7(b) to 7(c) show a recovery trend in feature distribution, with improved intra-class clustering and reduced boundary clutter, corresponding to a rebound in Replay-2 accuracy. This indicates that the drift-aware prototype update mechanism gradually calibrates the prototype structure after the initial domain shift, restoring the encoder's feature extraction capability. Figures 7(c) to 7(d) The distribution structure remained largely stable, with only slight boundary blurring, resulting in a minor decrease in Replay-3 accuracy. The entire evolution exhibited a gradual drift, with each domain maintaining the basic topological structure of the source domain without any structural collapse.
[0135] 4.3 Ablation Experiment
[0136] Table 7 systematically ablates the proposed method from four dimensions, quantitatively verifying the independent contribution of each module.
[0137] Regarding the backbone network, the accuracy (AA) decreases sequentially for Mamba, Transformer, and MLP configurations, while the accuracy (AF) increases sequentially. This indicates that Mamba has stronger expressive power for extracting structured features from two-dimensional traffic images, enabling the source domain prototype to have higher class discriminative power during the initialization phase. It is noteworthy that even with the MLP backbone, Proposed-3's AA is still significantly higher than AOC-IDS and SSF using the same backbone, suggesting that the performance gain mainly stems from the method design rather than differences in encoder architecture.
[0138] After removing the outlier anchor point loss, both AA and AF showed significant degradation, indicating the inadequacy of the one-way contrast constraint in the aggregation of outlier features. The increase in AF was particularly pronounced after the filtering mechanism was removed, suggesting that the absence of the filtering mechanism not only impairs the accuracy of new domain adaptation but also accelerates the forgetting of historical domain knowledge.
[0139] In the layer-by-layer dissolution of anti-forgetting constraints, or The removal of both constraints resulted in varying degrees of performance degradation, revealing the asymmetry in their functional division. Furthermore, the simultaneous removal of both constraints led to a significant performance degradation, approaching the level of comparative methods without dedicated anti-forgetting designs, thus validating the necessity of dual-anchored collaborative protection.
[0140] Table 8 compares the three covariance modeling methods. When only the prototype vector is used for decision-making using Euclidean distance, both AA and AF show significant degradation, indicating that the difference in dispersion across dimensions has a significant impact on decision accuracy. The complete covariance method only brings less than 2% accuracy improvement compared to the diagonal covariance method, but the inference speed decreases by nearly 30%. The diagonal approximation trades significant efficiency advantages for minimal accuracy cost, offering a better balance between accuracy and efficiency in online adaptive scenarios.
[0141] Table 7 Ablation experiments of the proposed method
[0142]
[0143] Table 8 Comparison of Covariance Modeling Methods
[0144]
[0145] 4.4 Computational Efficiency Analysis
[0146] To evaluate the computational cost of the proposed methods, this section provides a comprehensive comparison of the methods based on inference speed, number of model parameters, and continuous adaptation cost. For example... Figure 8 As shown, Proposed-1 exhibits the best performance-efficiency tradeoff among all methods. While AOC-IDS, SSF, EWC, and LwF possess high inference speeds, their accuracy (AA) is significantly lower than the proposed method. Proposed-3, while maintaining a similar inference speed to AOC-IDS, achieves an AA improvement of approximately 18%, validating that the proposed method design can also deliver significant performance gains with a lightweight backbone. Proposed-2, limited by the quadratic complexity of the Transformer self-attention mechanism, has the lowest inference speed among the three configurations, while its AA improvement is only 2.38% lower than Proposed-1. These results demonstrate that Mamba has a more significant advantage in sequence modeling efficiency for encrypted traffic intrusion detection.
[0147] Figure 9The proposed method has 2.18M parameters, significantly larger than the MLP-based comparison methods. This increase in size stems from the sequence modeling capacity of the Mamba encoder itself, rather than the additional overhead introduced by prototype updates or anti-forgetting mechanisms. Nevertheless, the domain adaptation time of the proposed method is much shorter than that of AOC-IDS. The latter requires offline refitting of the biGaussian distribution for each target domain, and the adaptation overhead accumulates continuously with the sample size. The prototype update and distillation constraints of the proposed method are performed online at the sample level, and the adaptation time is linearly related to the sample size rather than batch-dependent. Although the adaptation times of SSF, EWC, and LwF are similar to those of the proposed method, their AA lags behind by more than 10%, and their overall performance and efficiency competitiveness is significantly insufficient.
[0148] 5. Summary
[0149] This invention presents an unsupervised continuous learning method to address the performance degradation caused by distribution shift and missing target domain annotations in encrypted traffic detection scenarios. The method uses Mamba as the feature extraction backbone, leveraging its selective state-space mechanism to model long-range dependencies in packet-length sequences and temporal intervals with linear complexity. In the source domain stage, BSC loss drives the normal and malicious classes to form compact and separated geometric structures in the latent space, using a diagonal covariance prototype as a lightweight category knowledge carrier. In the target domain stage, decision filtering and adaptive prototype update mechanisms effectively track distribution drift while suppressing the cumulative contamination of the prototype by noisy samples. Furthermore, a dual-anchoring mechanism achieves collaborative anti-forgetting at both the encoder parameter and prototype structure levels. Experimental results show that the proposed method significantly outperforms the comparative methods in terms of average accuracy and forgetting rate. Further analysis of computational efficiency reveals that the proposed method achieves the optimal trade-off.
[0150] References
[0151] [1]Papadogiannaki E, Ioannidis S. A survey on encrypted networktraffic analysis applications, techniques, and countermeasures[J]. ACMComputing Surveys (CSUR), 2021, 54(6): 1-35.
[0152] [2]Liu C, He L, Xiong G, et al. Fs-net: A flow sequence network forencrypted traffic classification[C] / / IEEE INFOCOM 2019-IEEE Conference OnComputer Communications. IEEE, 2019: 1171-1179.
[0153] [3]Lin X, Xiong G, Gou G, et al. ET-BERT: A contextualized datagramrepresentation with pre-training transformers for encrypted trafficclassification[C] / / Proceedings of the ACM Web Conference 2022. 2022: 633-642.
[0154] [4]Xie R, Wang Y, Cao J, et al. Rosetta: Enabling robust tlsencrypted traffic classification in diverse network environments with tcp-aware traffic augmentation[C] / / Proceedings of the ACM turing awardcelebration conference-China 2023. 2023: 131-132.
[0155] [5]Cui S, Han X, Han D, et al. FG-SAT: Efficient flow graph forencrypted traffic classification under environment shifts[J]. IEEETransactions on Information Forensics and Security, 2025.
[0156] [6]Tong V, Dao C, Tran H A, et al. Encrypted traffic classificationthrough deep domain adaptation network with smooth characteristic function[J]. IEEE Transactions on Network and Service Management, 2025, 22(1): 331-343.
[0157] [7]Zhang X, Zhao R, Jiang Z, et al. Continual learning with strategicselection and forgetting for network intrusion detection[C] / / IEEE INFOCOM2025-IEEE Conference on Computer Communications. IEEE, 2025: 1-10.
[0158] [8]Lee I, Roh H, Lee W. Encrypted malware traffic detection usingincremental learning[C] / / IEEE INFOCOM 2020-IEEE Conference on ComputerCommunications Workshops (INFOCOM WKSHPS). IEEE, 2020: 1348-1349.
[0159] [9]Channappayya S, Tamma B R. Augmented memory replay-based continuallearning approaches for network intrusion detection[J]. Advances in NeuralInformation Processing Systems, 2023, 36: 17156-17169.
[0160]
[10] Li Z, Liu M, Wang P, et al. Multi-ARCL: Multimodal adaptiverelay-based distributed continual learning for encrypted trafficclassification[J]. Journal of Parallel and Distributed Computing, 2025, 201:105083.
[0161]
[11] Amalapuram S K, Tamma B R, Channappayya S S. Spider: A semi-supervised continual learning-based network intrusion detection system[C] / / IEEE INFOCOM 2024-IEEE Conference on Computer Communications. IEEE, 2024:571-580.
[0162]
[12] Zhao R, Zhan M, Deng X, et al. A novel self-supervised frameworkbased on masked autoencoder for traffic classification[J]. IEEE / ACMTransactions on Networking, 2024, 32(3): 2012-2025.
[0163]
[13] Wang T, Xie X, Wang W, et al. Netmamba: Efficient network trafficclassification via pre-training unidirectional mamba[C] / / 2024 IEEE 32ndInternational Conference on Network Protocols (ICNP). IEEE, 2024: 1-11.
[0164]
[14] Wang X. Enidrift: A fast and adaptive ensemble system for networkintrusion detection under real-world drift[C] / / Proceedings of the 38th annualcomputer security applications conference. 2022: 785-798.
[0165]
[15] Zhang X, Wang Y, Ohtsuki T, et al. Malware traffic classificationvia expandable class incremental learning with architecture search[J]. IEEETransactions on Information Forensics and Security, 2025.
[0166]
[16] Wei K, Yang X, Xu Z, et al. Class-incremental unsupervised domainadaptation via pseudo-label distillation[J]. IEEE Transactions on ImageProcessing, 2024, 33: 1188-1198.
[0167]
[17] Kan Z, Pendlebury F, Pierazzi F, et al. Investigating labellessdrift adaptation for malware detection[C] / / Proceedings of the 14th ACMWorkshop on Artificial Intelligence and Security. 2021: 123-134.
[0168]
[18] Alam M T, Piplai A, Rastogi N. ADAPT: A Pseudo-labeling Approachto Combat Concept Drift in Malware Detection[C] / / 2025 28th InternationalSymposium on Research in Attacks, Intrusions and Defenses (RAID). IEEE, 2025:693-712.
[0169]
[19] Wang C, Qi Q, Wu J, et al. MCAKE: Memory-augmented autoencoderwith contrastive learning for unsupervised anomaly detection[J]. ACMTransactions on Knowledge Discovery from Data, 2025, 19(8): 1-18.
[0170]
[20] Zhang X, Zhao R, Jiang Z, et al. Aoc-ids: Autonomous onlineframework with contrastive learning for intrusion detection[C] / / IEEE INFOCOM2024-IEEE Conference on Computer Communications. IEEE, 2024: 581-590.
[0171]
[21] Lin H, Zhang Y, Qiu Z, et al. Prototype-guided continualadaptation for class-incremental unsupervised domain adaptation[C] / / EuropeanConference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 351-368.
[0172]
[22] Zhang P, Zhang B, Zhang T, et al. Prototypical pseudo labeldenoising and target structure learning for domain adaptive semanticsegmentation[C] / / Proceedings of the IEEE / CVF conference on computer visionand pattern recognition. 2021: 12414-12424.
[0173]
[23] Yang L, Guo W, Hao Q, et al. {CADE}: Detecting and explainingconcept drift samples for security applications[C] / / 30th USENIX SecuritySymposium (USENIX Security 21). 2021: 2327-2344.
[0174]
[24] Zhu F, Zhang X Y, Wang C, et al. Prototype augmentation and self-supervision for incremental learning[C] / / Proceedings of the IEEE / CVFconference on computer vision and pattern recognition. 2021: 5871-5880.
[0175]
[25] De Lange M, Tuytelaars T. Continual prototype evolution: Learningonline from non-stationary data streams[C] / / Proceedings of the IEEE / CVFinternational conference on computer vision. 2021: 8250-8259.
[0176]
[26] Kirkpatrick J, Pascanu R, Rabinowitz N, et al. Overcomingcatastrophic forgetting in neural networks[J]. Proceedings of the nationalacademy of sciences, 2017, 114(13): 3521-3526.
[0177]
[27] Zenke F, Poole B, Ganguli S. Continual learning through synapticintelligence[C] / / International conference on machine learning. Pmlr, 2017:3987-3995.
[0178]
[28] Li Z, Hoiem D. Learning without forgetting[J]. IEEE transactionson pattern analysis and machine intelligence, 2017, 40(12): 2935-2947.
[0179]
[29] Park J, Ji A H, Park M, et al. MalCL: Leveraging gan-basedgenerative replay to combat catastrophic forgetting in malware classification[C] / / Proceedings of the AAAI Conference on Artificial Intelligence. 2025, 39(1): 658-666.
[0180]
[30] Oord A, Li Y, Vinyals O. Representation learning with contrastivepredictive coding[J]. arXiv preprint arXiv:1807.03748, 2018.
[0181]
[31] Goswami D, Liu Y, Twardowski B, et al. Fecam: Exploiting theheterogeneity of class distributions in exemplar-free continual learning[J].Advances in Neural Information Processing Systems, 2023, 36: 6582-6595.
[0182]
[32] Ma J, Kulesza A, Dredze M, et al. Exploiting feature covariancein high-dimensional online learning[C] / / Proceedings of the ThirteenthInternational Conference on Artificial Intelligence and Statistics. JMLRWorkshop and Conference Proceedings, 2010: 493-500.
[0183]
[33] Gu A, Dao T. Mamba: Linear-time sequence modeling with selectivestate spaces[C] / / First conference on language modeling. 2024.
[0184]
[34] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[0185]
[35] Maaten L, Hinton G. Visualizing data using t-SNE[J]. Journal ofmachine learning research, 2008, 9(Nov): 2579-2605.
Claims
1. A malicious encrypted traffic detection method based on unsupervised domain incremental learning, which uses a traffic detection model to detect normal traffic and malicious traffic from the input encrypted traffic. Its characteristics are In scenarios involving changes in the network environment, the traffic detection model uses a prototype structure as a carrier for cross-domain knowledge transfer and learns within the unsupervised domain incremental learning (DIL) framework. The flow detection model uses Mamba as the feature extraction backbone; In the initial training phase of the model in the source domain, the normal class and the malicious class are driven to form a compact and separate geometric structure in the latent space through bidirectional symmetric contrastive loss (BSC), and the diagonal covariance prototype is used as a lightweight cross-domain category knowledge carrier. During the target domain continuous adaptation phase after the model is deployed to the target domain, a drift-aware prototype update mechanism based on standardized Mahalanobis distance is adopted, combined with decision confidence screening and dynamic momentum adjustment.
2. The malicious encrypted traffic detection method based on unsupervised domain incremental learning according to claim 1, characterized in that: The network environment change scenario is abstracted as follows: The network environment is a collection of , No. The dataset in each environment is ,in For traffic samples eigenvectors, The total number of samples; source domain The samples have complete benign and malicious labels. The target domains that are subsequently reached The tag information is completely unavailable; Significant distributional differences exist between different domains, expressed as: for ,have , Indicates the first An environment; and They represent In the The and the first Prototype vectors in each environment.
3. The malicious encrypted traffic detection method based on unsupervised domain incremental learning according to claim 2, characterized in that: The construction of the unsupervised DIL framework includes: (I) Source Domain Prototype Construction The goal of the source domain training phase is to drive the encoder to form a discriminative class geometry in the latent space; normal and malicious traffic prototypes are explicitly initialized, providing a structured starting point for the continuous adaptation of the unlabeled target domain. 1) Two-way symmetric contrastive representation learning Let the number of normal samples and the number of malicious samples in the batch be respectively and The latent space feature vector sets of normal samples and malicious samples are denoted as follows: and The similarity between samples is measured using cosine similarity. The inter-class similarity between normal samples and malicious samples within a batch is calculated as follows: (1), in This refers to temperature hyperparameters. The source domain training phase employs bidirectional symmetric contrastive loss (BSC). ① Define normal anchor point loss: (2), In the formula, and Each represents a single traffic sample; ② Obtained from the symmetry of cosine similarity , The anchor point loss is shared between normal and malicious classes; therefore, the anchor point loss for malicious classes is defined as follows: (3), ③ The loss corresponding to the source domain training, the BSC loss consists of normal class anchor loss and malicious class anchor loss, expressed as: (4); 2) Distributed modeling of the prototype After the source domain training converges, the encoder forms a discriminative class geometry in the latent space; Let the sets of normal samples and malicious samples in the source domain be respectively. and Normal traffic prototype With malicious traffic prototype Initialized as: (5); Based on the prototype vectors, the distribution range of each category in the feature space is further modeled; samples The direction and magnitude of deviation from the class center in each feature dimension are denoted as . ; The diagonal covariance is used to approximate the spatial distribution of each category; the diagonal covariances of the malicious traffic category and the normal traffic category are calculated as follows: (6-1), (6-2), in, Represents element-wise product; (ii) Sample decision and prototype update in the target domain After initializing the source domain prototype structure, the model is adapted sequentially to the unlabeled target domain sequence. ; An adaptive prototype update mechanism based on sample-level drift awareness is adopted; 1) Sample decision and selection based on Mahalanobis distance Mahalanobis distance is used as the distance metric between the sample and the prototype; let the nth sample in the target domain... The feature vector of each arriving sample is ; and Mahalanobis distance Calculated as: (7-1), and The Mahalanobis distance is Calculated as: (7-1), in, , , and The corresponding vectors are the first One component; Mahalanobis distance enables the distance metric to adaptively reflect the degree of intra-class dispersion across all dimensions; Then, the judgment result of the sample is formalized as a binary variable: (8), in, This indicates that the sample has been identified as malicious traffic. This indicates that the sample was determined to be normal traffic. The absolute value of the difference between the sample and the two prototype Mahalanobis distances is used to measure the decision confidence, expressed as: (9); Given a decision confidence threshold Only when At that time, high-confidence samples were included in the subsequent prototype update process; 2) Drift-aware prototype update sample The degree of deviation from existing prototypes is characterized by the minimum Mahalanobis distance, expressed as: (10), The expected value varies with the feature dimension Linear growth, Standardized to It approximately follows a standard normal distribution, and the truncation threshold based on the standardized Mahalanobis distance is taken at a corresponding confidence level. Corresponding standard normal quantile ; Then, the momentum coefficient of drift perception is formalized as: (11), in, This refers to the drift sensitivity hyperparameter. Only when the sample simultaneously meets the dual screening criteria and hour, The update step size decreases monotonically with the degree of distribution deviation; the more significant the drift, the larger the update step size. When samples do not simultaneously meet the dual screening conditions, the samples are judged as extreme samples that exceed the confidence boundary. For samples that meet the dual screening criteria, the prototype update only applies to the category to which the judgment belongs; Assuming the current sample is classified as the normal category, the corresponding first... individual domains Updated to: (12), Synchronized and updated to: (13), The covariance and prototype updates share the same momentum coefficient. This ensures that the estimated distribution range and the calibration of the distribution center are consistent in terms of update magnitude.
4. The malicious encrypted traffic detection method based on unsupervised domain incremental learning according to claim 1, 2 or 3 is characterized in that the anti-forgetting method in the learning process adopts a collaborative anti-forgetting mechanism with two-layer anchoring constructed from encoder parameters and decision space.
5. The malicious encrypted traffic detection method based on unsupervised domain incremental learning according to claim 3, characterized in that: The unsupervised DIL framework also includes anti-forgetting mechanisms; Synergistic anti-forgetting mechanisms include: 1) Encoder knowledge distillation During domain switching, the encoder parameters from the previous domain adaptation are frozen and saved as the teacher model, while the encoder in the current domain adaptation is used as the student model. The teacher model parameters and student model parameters are respectively... and ; for the first Samples reaching the target domain The distillation loss is: (14); For the sample Equation (14) continuously constrains the decision results of the student model. Judgment results that do not deviate from the teacher model ; During domain switching The encoder state is synchronously updated to reflect the previous domain-adapted state. 2) Prototype structure anchoring Entering the Before each target domain, the prototypes of normal and malicious traffic in the preceding domain are frozen and saved as historical anchors. And apply prototype structure anchoring constraints to the encoder throughout the current domain adaptation process; and They represent the first The historical anchor prototype of each domain; set up The judgment result is classified as normal, and the prototype anchoring regularization term is: (15), Equation (15) makes the sample features generated by the current encoder continuously approach the historical anchor point prototype of the decision; 3) The combined constraint loss composed of equations (14) and (15) is: (16), in, For hyperparameters; When the sample arrives, the encoder minimizes Update parameters.