An adaptive intrusion detection method for industrial internet
By combining open set detection and hierarchical federated learning with a meta-learning mechanism, the problems of traditional intrusion detection systems in identifying unknown attacks and high computational complexity are solved, achieving efficient adaptive intrusion detection and data privacy protection in the industrial internet environment.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHENYANG LIGONG UNIV
- Filing Date
- 2026-04-21
- Publication Date
- 2026-06-23
AI Technical Summary
Traditional intrusion detection systems struggle to identify unknown attacks, and in the industrial internet environment, they suffer from high computational complexity, strong data heterogeneity, and high real-time requirements, leading to low model training efficiency and difficulty in protecting data privacy.
Employing open-set detection, hierarchical federated learning, and meta-learning mechanisms, combined with a dual-branch parallel feature extraction structure and a hierarchical weighted aggregation strategy, adaptive intrusion detection is achieved through a client-edge server-global server architecture, enabling rapid identification of unknown threats and continuous model evolution.
It effectively identifies unknown threats, improves the security and adaptability of industrial internet systems, solves the limitations of traditional models in dynamic and open environments, and achieves data privacy protection and efficient model training in heterogeneous environments.
Smart Images

Figure CN122268656A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of industrial internet security, and specifically relates to an adaptive intrusion detection method for the industrial internet. Background Technology
[0002] The rapid development of the Industrial Internet has driven the intelligent transformation of the manufacturing industry, but it has also given rise to a variety of constantly evolving cyberattacks and security threats. Advanced persistent threats (APPTs), zero-day vulnerability attacks, and other unknown threats are constantly emerging, posing a serious challenge to the stable operation of industrial production systems and data security.
[0003] Traditional intrusion detection systems often rely on rule matching or model training based on known attack patterns, making it difficult to effectively identify unknown attacks. Furthermore, the industrial internet environment is characterized by widely distributed devices, highly heterogeneous data, high real-time requirements, and urgent data privacy protection needs, making centralized intrusion detection methods unsuitable.
[0004] While traditional Kolmogorov-Arnold networks (KANs) have demonstrated potential in complex data modeling due to their powerful nonlinear fitting capabilities, the computational mechanism of their core B-spline function has become a bottleneck for their implementation in industrial scenarios. The recursive nature of the B-spline function requires iterative solutions through continuity constraints on local nodes, with each node's computation depending on the outputs of its neighbors, forming a one-to-one sequential dependency chain. This mechanism not only leads to a surge in memory bandwidth consumption but also makes it difficult to utilize the parallel computing architectures of modern GPUs, resulting in low training efficiency. In federated learning scenarios, this problem is further amplified: industrial edge nodes typically have limited computing power, and the high computational complexity of KANs forces a reduction in local training rounds, thus affecting the convergence accuracy of the global model. Simultaneously, the long training time caused by sequential computation increases communication latency between edge nodes and the central server, violating the real-time requirements of the Industrial Internet.
[0005] Furthermore, the traditional FedAvg aggregation algorithm employs a linear weighting strategy based on the number of samples. In non-independent and identically distributed (Non-IID) scenarios, it is easily dominated by large-sample clients, which weakens the contribution of small-sample clients to model updates. This, in turn, affects the global model's ability to generalize to the overall data distribution and may lead to slow model convergence or even deviation from the optimal solution. Summary of the Invention
[0006] To address the aforementioned technical problems, this invention provides an adaptive intrusion detection method for the Industrial Internet. Starting from the practical needs of responding to continuously evolving unknown network attacks, it integrates open set detection, hierarchical federated learning, and meta-learning mechanisms. This method can effectively perceive and adapt to new threats while protecting data privacy in a distributed environment. It breaks through the limitations of traditional models in dynamic open environments and improves the overall security and adaptability of the Industrial Internet system.
[0007] The main technical solution adopted in this invention is as follows:
[0008] An adaptive intrusion detection method for the Industrial Internet includes the following steps:
[0009] Step S1: Deploy an open-set model based on known network attack categories as the initial model on the global server, and distribute the model parameters to each edge server and its corresponding client;
[0010] Step S2: The client trains on the local dataset, identifies unknown network attack samples through the open set detection mechanism, and stores the unknown features in the local buffer;
[0011] Step S3: The client uploads the unknown features in the local buffer, which are then forwarded to the global server via the edge server. The global server performs cluster analysis on the aggregated unknown features to generate pseudo-labels and corresponding initial category centers.
[0012] Step S4: The client uses a meta-learning mechanism to perform rapid adaptation training with a small number of samples using pseudo-labeled samples, and obtains updated client-side local model parameters.
[0013] Step S5: The edge server aggregates the model parameters of the clients within the group, and the global server aggregates the model parameters of each edge server to generate an updated global model;
[0014] Step S6: Repeat steps S2 to S5 until the model performance converges or the preset number of iterations is reached;
[0015] Step S7: Deploy the trained global model on the client to perform intrusion detection on real-time network traffic.
[0016] Preferably, in step S1, the open set model employs a dual-branch parallel feature extraction structure, which includes:
[0017] The first branch is used to extract the original local features of the input;
[0018] The second branch is used to extract features through nonlinear transformation after normalizing the input;
[0019] The fusion module merges the outputs of the first branch and the second branch.
[0020] Preferably, in step S2, when the client trains on the local dataset, it maintains the dynamic centroids of each known network attack category and updates the model parameters using the CFE triple loss function. The dynamic centroids are updated using an exponential moving average method, as shown in the following formula:
[0021] (1);
[0022] in, As the current center of mass, For the updated centroid, To predict the correct sample feature mean, For smoothing coefficients, This refers to known network attack categories.
[0023] Preferably, in step S2, the CFE triple loss function is expressed as:
[0024] (2);
[0025] in, For the total loss, For cross-entropy loss, For Fisher's loss, For regular approximation EMD loss, , It is the loss weight hyperparameter.
[0026] 5. The adaptive intrusion detection method for the Industrial Internet according to claim 4, wherein the Fisher loss includes intra-class loss and inter-class loss, expressed as:
[0027] (4);
[0028] Among them, intra-class loss Represented as:
[0029] (5);
[0030] Inter-class loss Represented as:
[0031] (6);
[0032] in, As a balance factor, The number of known categories. For the first The number of class samples, It is the class center, For sample index, No. The first class of samples The intermediate layer feature output of each sample after passing through the model This represents two different category indexes. and They represent the first The feature center of the class and the first The feature center of the class.
[0033] Preferably, the regular approximate EMD loss is calculated through the following steps:
[0034] Calculate the gradient of Fisher loss with respect to the input samples, and generate perturbations using the fast gradient sign method to obtain perturbation samples;
[0035] The slice Wasserstein distance between the original sample features and the perturbed sample features is calculated as the regular approximate EMD loss.
[0036] Preferably, in step S2, the open set detection mechanism uses a dynamic threshold to identify unknown samples. The method for setting the dynamic threshold is as follows: during the model validation stage, the distances from the features of all known class training samples to their nearest known class centroids are collected to form a known class distance set, and the preset percentile of this set is taken as the decision threshold.
[0037] Preferably, in step S3, the global server performs cluster analysis on the aggregated unknown features, specifically including:
[0038] Density clustering algorithm is used to cluster unknown features, automatically determine the number of effective clusters and identify noise samples;
[0039] Using the cluster centers of the density clustering results as initial cluster centers, K-means clustering is performed for fine-grained partitioning, and a globally unique pseudo-label is assigned to each final cluster.
[0040] After assigning pseudo-labels, a high-confidence subset is formed by selecting a preset proportion of samples from each cluster that are closest to the cluster center. The high-confidence subset, its corresponding pseudo-labels, and the cluster center of each cluster are then sent to each client as the initial category center.
[0041] Preferably, in step S4, the meta-learning mechanism is an ANIL-based meta-learning mechanism, and the specific method for using pseudo-labeled samples for rapid adaptation training with few samples is as follows:
[0042] The client matches the received pseudo-labels with the local set of unknown features to construct a pseudo-label dataset, and merges it with the known class dataset to form an expanded training set;
[0043] Few-shot tasks are constructed by sampling from an expanded training set, and each meta-task consists of a support set and a query set;
[0044] During the inner loop adaptation phase, the classification head parameters are updated using gradients with the support set, while the feature extractor parameters remain fixed.
[0045] During the outer loop update phase, the loss is calculated using the query set and the feature extractor parameters are updated.
[0046] Preferably, the hierarchical model aggregation method in step S5 includes the following:
[0047] Intra-group aggregation phase: Each edge server merges the parameters of multiple client model under its jurisdiction, using a square root weighting mechanism based on the number of samples;
[0048] Global aggregation phase: The global server receives the intra-group aggregation models from each edge server and performs aggregation using a linear weighting mechanism based on the total number of samples.
[0049] Beneficial effects: This invention provides an adaptive intrusion detection method for the Industrial Internet, which has the following advantages:
[0050] (1) This invention constructs a highly discriminative feature space by combining an open set detection mechanism with dynamic threshold setting, which can effectively distinguish between known attacks and unknown attacks, overcome the technical defects of traditional intrusion detection systems that are difficult to identify unknown threats, and significantly improve the overall security of industrial internet systems.
[0051] (2) By constructing a hierarchical federated learning framework and combining a meta-learning mechanism with a clustering labeling algorithm, this invention can integrate locally discovered unknown threats into global cognition while protecting data privacy, and enable the model to quickly adapt to new attack categories. This mechanism breaks through the limitations of traditional models in dynamic open environments, realizes the continuous evolution of the model, and significantly improves its adaptability to continuously evolving unknown threats.
[0052] (3) By adopting a three-layer collaborative architecture of “client-edge server-global server” and combining it with a layered weighted aggregation strategy, this invention effectively alleviates the impact of non-independent and identically distributed data on federated learning in the industrial Internet environment, overcomes the problem that the traditional federated averaging algorithm is dominated by large sample clients and insufficient contribution from small sample clients in heterogeneous scenarios, and improves the convergence stability and generalization performance of the model in heterogeneous environments. Attached Figure Description
[0053] Figure 1 This is a schematic diagram of the distributed network structure of the present invention.
[0054] Figure 2This is a schematic diagram of the dual-path Kolmogorov-Arnold open set model structure of the present invention. Detailed Implementation
[0055] To enable those skilled in the art to better understand the technical solutions in this application, the technical solutions in the embodiments of this application are clearly and completely described below. Obviously, the described embodiments are only some embodiments of this application, and not all embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of this application.
[0056] Example 1
[0057] like Figure 1 As shown, this invention constructs a three-layer collaborative industrial internet architecture of "client-edge server-global server" to support model training and knowledge fusion in a distributed environment. The client layer consists of various intelligent devices and sensors deployed in the industrial field, responsible for data collection and local model training; the edge server is deployed near the factory, responsible for model aggregation and feature uploading from clients within its jurisdiction; and the global server, as the core data hub, is responsible for global model aggregation and cluster analysis of unknown features.
[0058] In one alternative implementation, the adaptive intrusion detection method is implemented as follows:
[0059] Step S1: System initialization. Deploy an open-set model based on known network attack categories as the initial model on the global server, and distribute the model parameters to each edge server and its corresponding client.
[0060] Step S2: Model Training. The client trains the model on the local dataset, identifies unknown samples using an open-set detection mechanism, and stores the unknown features in a local buffer.
[0061] Step S3: Unknown Feature Aggregation and Cluster Labeling. The client trains on the local dataset, identifies unknown samples through an open-set detection mechanism, and stores the unknown features in a local buffer.
[0062] Step S4: Model Adaptive Expansion. The global server distributes the pseudo-labels and initial class centers to each client. The client expands the classification head dimension according to the new class and, based on the meta-learning mechanism, performs rapid adaptation training with a small number of samples using pseudo-labeled samples to obtain updated local model parameters.
[0063] Step S5: Hierarchical model aggregation. The edge servers aggregate the model parameters of clients within the group, and the global server aggregates the model parameters of each edge server to generate an updated global model.
[0064] Step S6: Iterative optimization. Repeat steps S2 to S5 for multiple rounds of federated training until the model performance converges or the preset number of global iterations is reached.
[0065] Step S7: Deployment and Inference. Deploy the trained global model to various industrial field clients, extract features and calculate distances from real-time network traffic, and classify known network attacks and identify unknown network attacks based on dynamic thresholds.
[0066] Regarding the bi-branch feature extraction structure of the open set model
[0067] In one alternative implementation, the dual-path Kolmogorov-Arnold open set model (DPKAN) is used as the initial model, such as... Figure 2 As shown, the open-set model employs a two-branch parallel feature extraction structure that combines FastKAN with a one-dimensional convolutional neural network (Conv1D). This structure includes:
[0068] The first branch extracts the original local features from the input after ReLU activation by a one-dimensional convolutional neural network layer.
[0069] The second branch normalizes the input by instance, performs a nonlinear transformation through radial basis functions, and then extracts features through a one-dimensional convolutional neural network layer.
[0070] The fusion module adds the outputs of the two branches element by element and then processes them through ReLU activation and max pooling operations.
[0071] This dual-branch structure combines the advantages of multi-scale feature fusion and residual connections, enhancing the model's expressive power while stabilizing the training process. This enables the network to have stronger generalization ability and recognition performance when processing complex one-dimensional signals. Through the synergistic effect of linear convolution and radial basis functions, this structure achieves adaptive feature enhancement while maintaining parameter efficiency, balancing local feature extraction with global nonlinear modeling.
[0072] FastKAN represents a fundamental improvement to the traditional Kolmogorov-Arnold Network (KAN) computation mechanism. Its core lies in replacing the B-spline function with a radial basis function (RBF) and utilizing a Gaussian kernel function to achieve nonlinear mapping. Unlike the recursive iteration of B-splines, RBF computation inherently possesses parallelism: the distance between each input feature and the center of the radial basis function can be calculated independently, without relying on intermediate results from other nodes, thus perfectly adapting to the parallel computing paradigm of GPUs. This improvement increases single-step computation efficiency several times, not only shortening model training time but also reducing the computational demands on edge devices. Under the same hardware conditions, FastKAN can support more local training epochs, providing better adaptability for edge nodes in federated learning.
[0073] Combining FastKAN with convolutional neural networks fully leverages their complementary advantages: FastKAN adaptively adjusts feature representations through radial basis functions, while convolutional neural networks utilize multi-scale convolution and pooling operations to enhance the ability to capture complex patterns and achieve multi-dimensional feature fusion. Simultaneously, the introduction of mechanisms such as layer normalization and Dropout further improves the model's generalization ability and training stability, enabling the entire feature extractor to achieve a dual improvement in feature extraction capability and robustness while maintaining computational efficiency.
[0074] Regarding the open collection detection mechanism
[0075] The system is trained on a dataset without unknown network attack categories, maintains dynamic centroids for each category, and updates model parameters using the CFE triple loss function. When an unknown class appears in the dataset, the system calculates the minimum distance from the sample features to the centroids of each category, identifies the unknown samples using dynamic thresholds, and stores the unknown features in a local buffer.
[0076] Regarding the dynamic centroid mechanism
[0077] In one optional implementation, the model dynamically updates the centroid of each known network attack category during training. Specifically, for each input sample, the model calculates the distance between its feature vector and the centroids of all known categories, and selects the category corresponding to the nearest centroid as the predicted category. For correctly predicted samples, the centroid of the corresponding category is updated using its feature vector. To avoid drastic changes in the centroids, this embodiment uses an exponential moving average algorithm for updating, meaning the new centroid is a weighted average of the current centroid and the newly calculated centroid. The formula is as follows:
[0078] (1);
[0079] in, As the current center of mass, For the updated centroid, To predict the correct sample feature mean, For smoothing coefficients, This refers to known network attack categories.
[0080] During the testing process, the distance between the sample features and the centroid of known network attack categories is calculated and compared with a set dynamic threshold to distinguish between known and unknown network attack categories.
[0081] Regarding the composite loss function
[0082] In one alternative implementation, to construct an open set recognition model that simultaneously optimizes classification accuracy, feature discriminative power, and distribution matching ability, the CFE triple loss function is used for model optimization. Cross-entropy loss ensures correct classification of known categories, Fisher loss enhances intra-class compactness and inter-class separation, and EMD loss is introduced to accurately measure the optimal transmission distance between the feature distribution and the category center distribution, thereby achieving accurate classification of known categories and effective detection of unknown categories. The expression for the CFE triple loss function is as follows:
[0083] (2);
[0084] in, For the total loss, For cross-entropy loss, For Fisher's loss, For regular approximation EMD loss, , It is the loss weight hyperparameter.
[0085] Cross-entropy loss As the core loss function in the supervised learning of this open set recognition model, its role is to guide the model to optimize the classification accuracy of known classes by quantifying the difference between the model's predicted probability distribution of known class samples and the true label distribution, thus providing a supervisory signal for the fundamental goal of "accurately identifying known classes." In multi-class scenarios, cross-entropy loss... The expression is as follows:
[0086] (3);
[0087] in, Indicates the sample index. The one-hot encoding format representing the true label. This is the class probability distribution obtained after the model output has been normalized using softmax, ensuring that the sum of the probabilities of all classes is 1. This expression amplifies the deviation between the predicted probabilities and the true labels through logarithmic operations, allowing the loss value to accurately reflect the classification error, thus providing a clear direction for updating the model parameters.
[0088] Fisher loss The model is designed based on the Fisher discriminant criterion. The Fisher loss objective comprises two aspects: minimizing intra-class loss and maximizing inter-class distance. By optimizing these two objectives, the model can learn a more discriminative feature space, thereby improving classification performance and open-set detection capability. Therefore, It consists of two parts, and the specific formula is as follows:
[0089] (4);
[0090] (5);
[0091] (6);
[0092] in, As a balance factor, The number of known categories. For the first The number of class samples, It is the class center, For sample index, No. The first class of samples The intermediate layer feature output of each sample after passing through the model This represents two different category indexes. and They represent the first The feature center of the class and the first The feature center of a class. By minimizing this loss, the goals of intra-class compactness and inter-class separation can be achieved simultaneously.
[0093] In this embodiment, the mechanism of Fisher loss is as follows:
[0094] Intra-class loss The degree of aggregation between similar sample features and their corresponding class centers is quantified and calculated as the mean of the squared distances between all sample features and their corresponding class centers. A smaller value indicates stronger compactness of similar features and smaller intra-class deviation, i.e., less constraint. , making Reduce the size to decrease intra-class distance. Inter-class distance. The mean of the squared distances between all class centers measures the dispersion of features across different classes; a larger value indicates higher class discrimination. Since the optimization objective must be unified as a minimization problem, the two terms are connected by a negative sign. Balance factor. Used to adjust the priority of the two, the smaller one Insufficient inter-class distance may lead to overlap of known class features; excessively large distance may also cause this. This may sacrifice intra-class compactness and weaken the stability of known class classifications. In the embodiment, The value of is obtained by optimizing the mesh parameters.
[0095] In open set recognition tasks, relying solely on cross-entropy loss and Fisher loss is insufficient to accurately characterize the "unknown regions" of the feature space. Therefore, this embodiment introduces EarthMover's Distance (EMD) as a third supervisory signal. By measuring the optimal transmission cost between the "clean feature distribution" and the "feature distribution after adversarial perturbation," it forces the network to maintain consistent output within local neighborhoods, thereby compressing the feature volume of known classes and indirectly leaving a larger blank area for unknown classes.
[0096] To generate the perturbation direction that best exposes the "crack" feature, this embodiment employs a Fast Gradient Sign Method (FGSM) to perform a gradient ascent on the Fisher loss. Given a batch of input samples... With the current network parameters fixed, first calculate the gradient of the Fisher loss with respect to the input samples:
[0097] (7);
[0098] in, This represents the gradient vector of the Fisher loss with respect to the input samples. The output of the intermediate layer features of the model. Represents the true label of the input sample. The set of feature centers for all categories Simultaneously measuring intra-class dispersion and inter-class separation, its gradient direction corresponds to the perturbation direction that "maximizes intra-class variance and minimizes inter-class distance." Subsequently, the gradient is signed and scaled to a preset magnitude. The worst-case perturbation is obtained:
[0099] (8);
[0100] in, The noise intensity hyperparameter is set to 0.03. This perturbation neither destroys the semantic information of the sample nor diminishes the shift in the feature distribution, making the EMD loss more sensitive to the "loose distribution" phenomenon.
[0101] After obtaining the perturbed samples, the original features are... With disturbance characteristics Projected to Random directions Calculate the sliced Wasserstein distance:
[0102] (9);
[0103] By minimizing the EMD loss, the network learns to output almost unchanged features even when the input is slightly perturbed, essentially "squeezing" similar samples into a more compact cluster. With a smaller cluster size, unknown samples are less likely to be mistakenly pulled into known class regions; at the same time, the model is less sensitive to noise, and there is more leeway in using simple distance thresholds to distinguish between known and unknown classes.
[0104] Regarding dynamic threshold settings
[0105] In one alternative implementation, a percentile-based dynamic threshold setting method is employed. During the model validation phase, this method is applied to all samples belonging to known categories (i.e., those whose labels belong to the category set already covered in the training set). ), calculate the distance between the feature vector of each sample and the centroids of all known classes, and select the minimum value as the distance between that sample and the "nearest known class". Let the total number of known class samples be . Then a set of known class distances can be constructed. ,in, Indicates the first The distance from a known class sample to its nearest known class centroid.
[0106] Using percentile-based threshold calculation, in the set The kth percentile of this set is selected as the final decision threshold τ, and the specific formula is as follows:
[0107] (10);
[0108] In the formula, This is a percentile function, which physically means that in a known set of class distances, k% of the sample distances are less than or equal to this threshold.
[0109] This dynamic threshold setting method is data-adaptive. In open set recognition tasks, the feature space distribution of different datasets varies significantly, such as intra-class clustering and feature scale. If a fixed threshold is used, for datasets with small overall intra-class distances, too many known class samples may be misclassified as unknown classes; while for datasets with large overall intra-class distances, the false negative rate for unknown classes may increase. In contrast, this embodiment sets a dynamic threshold based on the known class distance distribution, which can establish a quantifiable trade-off mechanism between "completeness of known class recognition" and "accuracy of unknown class detection" by adjusting the percentile k.
[0110] Taking k=95 as an example, the threshold value is... In other words, among the known class samples, 95% of the samples with a distance less than or equal to this threshold will be correctly identified as belonging to the known class; while the remaining 5% of the known class samples, because their distance exceeds the threshold, may be misidentified as belonging to the unknown class. In practical applications, the decision boundary can be flexibly controlled by adjusting the k value according to different tolerances for false positive and false negative rates.
[0111] Cluster analysis and pseudo-label generation
[0112] The unknown features received by the global server from various clients are scattered and heterogeneous. In one optional implementation, to integrate these scattered local discoveries into a globally unified understanding, i.e., to identify new attack categories with commonalities, a two-stage clustering and annotation method is proposed:
[0113] The first stage involves using density-based clustering algorithms (such as DBSCAN) to cluster unknown features, automatically determining the number of valid clusters and identifying noise samples. The specific method is as follows:
[0114] Let the set of unknown features collected by the global server be . Given radius parameter With the minimum number of samples MinPts, for any sample ,That The neighborhood is defined as:
[0115] (13);
[0116] like Then it is called Core points are defined as points within a cluster. Density-based clustering algorithms (such as DBSCAN) group mutually reachable core points and their neighboring boundary points into the same cluster based on the density reachability and density connectivity of these core points. Conversely, clusters that are neither core points nor located within any core point cluster are grouped together. Points within the neighborhood are labeled as noise.
[0117] In this embodiment, when the global server executes a density clustering algorithm (such as DBSCAN), it focuses on the number of identified valid clusters. The number of samples and spatial compactness within each cluster, as well as the sparse sample set judged as noise. This is achieved by adjusting... Compared to MinPts, DBSCAN strikes a balance between over-subdivision and over-merging. Ultimately, DBSCAN outputs the number of effective clusters. It is considered as an estimate of the number of potentially unknown categories in the current global scope, while the noise set They will be temporarily reserved and will not participate in subsequent central refinement and pseudo-tag generation.
[0118] The second stage: Using the cluster centers from the clustering results of density clustering algorithms (such as DBSCAN) as initial cluster centers, K-means clustering is performed for fine-grained partitioning. A globally unique pseudo-label is assigned to each final cluster, and the 50% of samples closest to the cluster center are selected to form a high-confidence subset. This subset, along with the pseudo-label and cluster centers, is then sent to the client. The specific method is as follows:
[0119] The output of DBSCAN Each cluster serves as the initialization basis for the second-stage K-means clustering, enabling further refinement of cluster centers and unified allocation of pseudo-labels.
[0120] For each DBSCAN cluster The mean vector of the sample features is calculated as the initial cluster center:
[0121] (14);
[0122] After removing noisy samples, the standard K-means process is performed on the remaining samples. In this iteration, the samples are first divided according to the nearest center:
[0123] (15);
[0124] The center of each cluster is then updated, iterating until the cluster assignments no longer change. After a finite number of iterations, a converged cluster partitioning result is obtained. and its corresponding center { .
[0125] For each cluster, the global server assigns it a globally unique pseudo-label ID. The samples within a cluster are sorted according to their distance from the cluster center, and the 50% of samples closest to the center are selected to form a high-confidence subset. These samples, along with pseudo-labels, are then distributed to all clients. To facilitate rapid client adaptation to new categories, the global server also... The mean vector is used as the initial centroid of the class during local training, thereby providing a stable initial centroid for the new class while controlling pseudo-label noise.
[0126] Regarding the rapid adaptation mechanism of meta-learning
[0127] In one alternative implementation, a parameter-isolation-based meta-learning mechanism (such as ANIL) is employed for rapid adaptation training with a small number of samples. The specific process is as follows:
[0128] During the open-set training phase, the client uses a feature extractor built based on DPKAN. ,in The model parameters represent client k. For the input sample... Its characteristics are represented as , where d is the feature dimension. A fully connected layer is used as the classification head to implement the classification task of known classes.
[0129] Each client maintains a dynamic centroid for each known class. These dynamic centroids are updated online using an exponentially weighted moving average mechanism, allowing them to smoothly adapt to slow changes in data distribution. Regarding feature space optimization, this stage introduces the CFE triple loss function. By constructing combinations of anchor samples, positive samples, and negative samples, it drives similar samples to cluster in the feature space, while dissimilar samples move away from each other, thereby enhancing the discriminative power of the features.
[0130] Based on the optimized feature space, the open set detection module identifies unknown samples by calculating the minimum Euclidean distance from sample features to all known class centroids. The system sets a dynamic threshold τ based on the percentile of the current sample distance distribution. When the minimum distance of a sample is greater than this threshold, it is determined to be an unknown class sample. The features of the identified unknown samples are stored in a local buffer to provide necessary training material for subsequent meta-learning stages.
[0131] Once the client obtains the pseudo-label, it compares it with the local set of unknown features. Perform matching and construct a pseudo-label dataset. ,in These are pseudo tags issued by the server. This indicates the number of new categories discovered. The client merges this pseudo-label dataset with the original known-class dataset to form an expanded training set. Based on this expanded dataset, the client uses the ANIL meta-learning framework for model optimization.
[0132] The meta-training process is achieved by constructing few-shot tasks from an expanded dataset. Each meta-task... By support set and query set The model is structured according to the N-way K-shot configuration. During the inner loop adaptation phase, the model rapidly adjusts the classification head parameters using the support set, employing cross-entropy loss as the optimization objective of the inner loop. Its calculation formula is as follows:
[0133] (11);
[0134] in, For classification header, These are its parameters. The sample represents the support set. This represents the sample's label or pseudo-label. Subsequently, the classification head parameters are adjusted using gradient descent. Make quick adjustments:
[0135] (12).
[0136] Feature extractor The algorithm has been fully optimized through triple loss during the previous open-set training phase, possessing strong discriminative feature extraction capabilities. In the meta-learning phase, the core task shifts to enabling the classification head to quickly adapt to newly discovered categories. Cross-entropy loss is used as the optimization objective, providing a clear and efficient gradient signal for rapid parameter updates in scenarios with few samples. Furthermore, it avoids calculating complex Fisher loss and EMD loss on a support set with extremely small sample sizes, ensuring the stability and computational efficiency of the inner loop update.
[0137] During the dynamic model expansion process in the meta-learning phase, the system needs to simultaneously consider triple updates of the classification head dimension, feature space structure, and centroid mechanism. When the client receives... After generating pseudo-labels for each new category, the first step is to expand the dimensionality of the classifier head to accommodate the expanded label space. In the inner loop of meta-learning, although the ANIL framework primarily optimizes the classifier head, the feature extractor also participates in fine-tuning during the outer loop update to ensure that the feature representations of the new categories are distributed in a coordinated manner with the known categories in the feature space. This design allows the feature extractor to learn a universal feature representation that adapts to both known and new categories, avoiding fragmentation of the feature space. Regarding centroid management, the centroid buffer needs to be expanded synchronously. For each newly discovered category... The client uses the new category center from the service area as the initial centroid and updates it using an exponentially weighted moving average mechanism.
[0138] Regarding the hierarchical weighted aggregation mechanism
[0139] In real-world industrial internet environments, the data collected by various clients often exhibits significant non-independent identically distributed (Non-IID) characteristics. This data heterogeneity primarily stems from differences in device type, operating conditions, deployment area, and task objectives, leading to substantial deviations in data distribution across different clients. To address these challenges, one optional implementation employs a hierarchical weighted aggregation mechanism, combining intra-group square root weighted aggregation with a global sample size weighted average to mitigate the bias caused by Non-IID data, while balancing local adaptability with global representativeness. This mechanism consists of two levels: intra-group aggregation and global aggregation, optimized for the data distribution characteristics of the client layer and edge server layer, respectively.
[0140] In the group aggregation phase
[0141] Each edge server merges the parameters of multiple client model instances under its jurisdiction. Let the... Group contains The client, of which the first The number of local training samples per client is The uploaded model parameters are This embodiment introduces a square root weighting mechanism, which enhances the relative influence of clients with small sample sizes by nonlinearly compressing the impact of large sample sizes. Weight coefficient of each client Defined as:
[0142] (16);
[0143] Aggregated within-group model weights The calculation formula is:
[0144] (17).
[0145] Global aggregation phase
[0146] The global server receives intra-group aggregation models from K groups. To maintain the statistical representativeness of the global model for the overall data distribution, this embodiment adopts a weighted average strategy based on the total sample size at the global level.
[0147] make Indicates the first The sum of the number of samples from all clients in the group. Let be the total number of samples from all participating clients in the system. Then the... The weight coefficient of the group in the global aggregation is Final model parameters after global aggregation for:
[0148] (18);
[0149] This embodiment achieves the dual goals of local debiasing and global fidelity preservation through a collaborative strategy of "intra-group square root weighting and global linear weighting." At the intra-group level, non-linear weighting using the square root of the sample size suppresses the dominance of large-sample clients, mitigates local biases caused by Non-IID data distribution, and enhances the model's adaptability to sparse classes. At the global level, linear weighting aggregation based on the total sample size of each group ensures that the global model fully reflects the overall data scale and structure of the system, maintaining its statistical representativeness. This mechanism helps improve the model's convergence stability and generalization performance in heterogeneous environments.
[0150] The following section uses a specific application scenario to illustrate the overall collaborative workflow of the aforementioned technical features.
[0151] In one optional implementation, the open set model employs a dual-path Kolmogorov-Arnold network (hereinafter referred to as DPKAN) as a feature extractor, and its complete workflow is as follows:
[0152] (I) DPKAN Open Set Intrusion Detection Process
[0153] Initialization phase, network parameters Determined by random initialization. Category centroid Initialize to a zero vector or random values. During the training phase, for each batch of input samples, the model first extracts features using the DPKAN feature extractor. Then, the CFE triple loss was calculated. And update network parameters through backpropagation. Meanwhile, for correctly predicted known class samples, the centroid of the corresponding class is dynamically updated using an exponential moving average method. The updated formula is as follows .
[0154] After training is complete, the threshold calculation phase begins. This involves iterating through all known class training samples and calculating the Euclidean distance from each sample's feature to its nearest known class centroid, thus creating a distance set. And take the kth percentile of the set as the decision threshold. .
[0155] During the testing phase, for any new sample First, extract its features. Calculate the minimum distance from this feature to all known class centroids. .like > If the condition is met, the sample is classified as an unknown class (i.e., an intrusion behavior); otherwise, it is classified as the nearest known class.
[0156] Finally, the algorithm returns the trained DPKAN model. Dynamic centroid and decision threshold .
[0157] (II) Layered Federated Adaptive Intrusion Detection Process Integrating Open Set Detection
[0158] In one alternative implementation, the above-described open set detection mechanism works in conjunction with the hierarchical federated learning framework according to the following process.
[0159] Initialization phase: Initializing global model parameters As determined by the federal mission publisher, the model contains information for a known set of categories. The system adopts a three-tier architecture: client-edge server-global server.
[0160] Global Iteration Phase: At the start of the t-th round of global iteration, the global server will update the current global model. The model is then distributed to all edge servers, which in turn broadcast the model to the clients participating in the training within the group.
[0161] Local training phase on the client:
[0162] when At that time, the client only uses known class data for standard supervised learning to maintain the centroids of each class;
[0163] when When this is done, the open set detection module is enabled, and dynamic thresholds are used. Identify unknown samples and store the unknown features in a buffer.
[0164] After local training is complete, the client uploads the updated model parameters and unknown features from its local buffer to its respective edge server. The edge server aggregates the unknown features uploaded by clients within the group and uploads the aggregated feature set to the global server.
[0165] Feature aggregation and annotation stage:
[0166] After receiving the unknown features uploaded by each edge server, the global server executes a confidence-based category labeling algorithm to generate pseudo-labels and corresponding initial category centers, and then sends this information to each edge server, which in turn forwards it to the client.
[0167] Model expansion and rapid adaptation phase:
[0168] After receiving the pseudo-labels and initial centroids of the new category, the client expands the classification head dimension, merges the pseudo-label samples with the known class samples to construct an expanded training set, and uses the ANIL meta-learning mechanism for rapid adaptive training.
[0169] Model aggregation phase:
[0170] Model aggregation is performed using the aforementioned hierarchical weighting mechanism:
[0171] During the intra-group aggregation phase, the edge server aggregates the model parameters of clients within the group using a square root weighted method;
[0172] During the global aggregation phase, the global server aggregates the model parameters of each edge server using a linear weighted method based on the number of samples to generate an updated global model.
[0173] Iteration and Output:
[0174] Repeat the above process for multiple rounds of federated training until the model performance converges or reaches the preset global iteration count T. Finally, the algorithm returns the global model parameters after training. .
[0175] The above description represents only preferred embodiments of the present invention. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principles of the present invention, and all such improvements and modifications should fall within the scope of protection of the present invention.
Claims
1. An adaptive intrusion detection method for the Industrial Internet, characterized in that, Includes the following steps: Step S1: Deploy an open-set model based on known network attack categories as the initial model on the global server, and distribute the model parameters to each edge server and its corresponding client; Step S2: The client trains on the local dataset, identifies unknown network attack samples through the open set detection mechanism, and stores the unknown features in the local buffer; Step S3: The client uploads the unknown features in the local buffer, which are then forwarded to the global server via the edge server. The global server performs cluster analysis on the aggregated unknown features to generate pseudo-labels and corresponding initial category centers. Step S4: The client uses a meta-learning mechanism to perform rapid adaptation training with a small number of samples using pseudo-labeled samples, and obtains updated client-side local model parameters. Step S5: The edge server aggregates the model parameters of the clients within the group, and the global server aggregates the model parameters of each edge server to generate an updated global model; Step S6: Repeat steps S2 to S5 until the model performance converges or the preset number of iterations is reached; Step S7: Deploy the trained global model on the client to perform intrusion detection on real-time network traffic.
2. The adaptive intrusion detection method for the Industrial Internet according to claim 1, characterized in that, In step S1, the open set model employs a dual-branch parallel feature extraction structure, which includes: The first branch is used to extract the original local features of the input; The second branch is used to extract features through nonlinear transformation after normalizing the input; The fusion module merges the outputs of the first branch and the second branch.
3. The adaptive intrusion detection method for the Industrial Internet according to claim 1, characterized in that, In step S2, when the client trains on the local dataset, it maintains the dynamic centroids of each known network attack category and updates the model parameters using the CFE triple loss function. The dynamic centroids are updated using an exponential moving average method, as shown in the following formula: (1); in, As the current center of mass, For the updated centroid, To predict the correct sample feature mean, For smoothing coefficients, This refers to known network attack categories.
4. The adaptive intrusion detection method for the Industrial Internet according to claim 3, characterized in that, In step S2, the CFE triple loss function is expressed as: (2); in, For the total loss, For cross-entropy loss, For Fisher's loss, For regular approximation EMD loss, , It is the loss weight hyperparameter.
5. The adaptive intrusion detection method for the Industrial Internet according to claim 4, characterized in that, The Fisher loss includes intra-class loss and inter-class loss, expressed as: (4); Among them, intra-class loss Represented as: (5); Inter-class loss Represented as: (6); in, As a balance factor, The number of known categories. For the first The number of class samples, It is the class center, For sample index, No. The first class of samples The intermediate layer feature output of each sample after passing through the model This represents two different category indexes. and They represent the first The feature center of the class and the first The feature center of the class.
6. The method according to claim 4, characterized in that, The regular approximate EMD loss is calculated through the following steps: Calculate the gradient of Fisher loss with respect to the input samples, and generate perturbations using the fast gradient sign method to obtain perturbation samples; The slice Wasserstein distance between the original sample features and the perturbed sample features is calculated as the regular approximate EMD loss.
7. The method according to claim 1, characterized in that, In step S2, the open set detection mechanism uses a dynamic threshold to identify unknown samples. The method for setting the dynamic threshold is as follows: In the model validation stage, the distances from the features of all known class training samples to their nearest known class centroids are collected to form a known class distance set, and the preset percentile of this set is taken as the decision threshold.
8. The method according to claim 1, characterized in that, In step S3, the global server performs cluster analysis on the aggregated unknown features, specifically including: Density clustering algorithm is used to cluster unknown features, automatically determine the number of effective clusters and identify noise samples; Using the cluster centers of the density clustering results as initial cluster centers, K-means clustering is performed for fine-grained partitioning, and a globally unique pseudo-label is assigned to each final cluster. After assigning pseudo-labels, a high-confidence subset is formed by selecting a preset proportion of samples from each cluster that are closest to the cluster center. The high-confidence subset, its corresponding pseudo-labels, and the cluster center of each cluster are then sent to each client as the initial category center.
9. The method according to claim 1, characterized in that, In step S4, the meta-learning mechanism is an ANIL-based meta-learning mechanism. The specific method for using pseudo-labeled samples for rapid adaptation training with few samples is as follows: The client matches the received pseudo-labels with the local set of unknown features to construct a pseudo-label dataset, and merges it with the known class dataset to form an expanded training set; Few-shot tasks are constructed by sampling from an expanded training set, and each meta-task consists of a support set and a query set; During the inner loop adaptation phase, the classification head parameters are updated using gradients with the support set, while the feature extractor parameters remain fixed. During the outer loop update phase, the loss is calculated using the query set and the feature extractor parameters are updated.
10. The method according to claim 1, characterized in that, The hierarchical model aggregation method in step S5 includes the following: Intra-group aggregation phase: Each edge server merges the parameters of multiple client model under its jurisdiction, using a square root weighting mechanism based on the number of samples; Global aggregation phase: The global server receives the intra-group aggregation models from each edge server and performs aggregation using a linear weighting mechanism based on the total number of samples.