A limited sample-oriented scalable internet of things device traffic identification method and device

By constructing a hybrid attention mechanism based on metric learning and prototype networks, the problems of dependence on a large number of labeled samples and catastrophic forgetting in IoT device traffic identification are solved. It achieves efficient and stable extended identification of new device categories under limited sample conditions, and is suitable for dynamic IoT network management.

CN122247666APending Publication Date: 2026-06-19SOUTHEAST UNIV

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
SOUTHEAST UNIV
Filing Date
2026-03-10
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing IoT device traffic identification methods suffer from several problems when faced with the continuous dynamic access of device categories and the limited number of new category samples. These problems include a strong reliance on a large number of labeled samples, high costs for model expansion and updates, and the susceptibility to catastrophic forgetting, leading to a decline in identification accuracy.

Method used

An incremental feature learning architecture is constructed by adopting a hybrid attention mechanism based on metric learning and prototype networks. A multi-feature temporal sample set is built by using data packet-level features and payload information. Combined with triplet networks and global attention mechanisms, it can achieve efficient extended recognition of new device categories. Furthermore, the prototype updates the attention network to maintain a stable discrimination boundary under the condition of a small number of labeled samples.

🎯Benefits of technology

Under limited sample conditions, the model can maintain high recognition accuracy, alleviate the problem of catastrophic forgetting, and achieve rapid adaptation and stable recognition of new device categories, making it suitable for dynamic IoT network management and security supervision.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122247666A_ABST
    Figure CN122247666A_ABST
Patent Text Reader

Abstract

This invention proposes a scalable IoT device traffic identification method and apparatus for limited samples. The method includes: data preprocessing and feature sample set construction for IoT traffic; training a basic feature extractor; pre-training a scalable IoT device identification model; model updating and IoT device traffic classification. Compared to existing identification schemes that rely on large amounts of data or frequent full retraining, this invention transforms the adaptation process of new categories into a controllable update of class prototypes through a joint mechanism of "metric embedding" and "prototype adaptive update." This allows the model to maintain a stable discrimination boundary even when new access devices can only obtain a small number of labeled traffic samples, mitigating catastrophic forgetting and improving scalability and deployability at the mechanism level. This method achieves an accuracy of over 90% under most learning conditions and can realize continuous device identification and timely monitoring in dynamically changing IoT environments.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of cyberspace security technology and relates to packet-level features, few-sample incremental learning and attention mechanism technology. Specifically, it relates to a scalable IoT device traffic classification method and device for limited samples. Background Technology

[0002] The Internet of Things (IoT) technology enables widespread connectivity between the physical and digital worlds by connecting various devices to networks. With the deepening application of IoT in industries such as manufacturing, healthcare, home, and transportation, the number of connected devices and the scale of supported services are growing rapidly, making it one of the key infrastructures carrying important information and business. At the same time, compared to traditional networks, IoT networks are characterized by larger scale, more heterogeneous device types and communication protocols, and more dynamic access behaviors, which places higher demands on network management, security maintenance, and performance optimization.

[0003] Due to the widespread characteristics of IoT devices, such as limited computing and storage resources, difficulty in firmware updates, and weak factory security mechanisms, they are chronically vulnerable to attack. They can become targets or be exploited by attackers as attack vectors. For example, attackers can exploit inherent vulnerabilities in devices to implant ransomware or spyware to steal sensitive information; infect devices with malware and use their resources for cryptocurrency mining; use controlled devices to build botnets to carry out large-scale distributed attacks; and even launch continuous infiltration and damage against critical infrastructure equipment. Accurate identification of the types of devices accessing IoT networks is crucial in addressing these threats. On the one hand, it provides a basis for device-side risk positioning and strategy formulation for security protection systems; on the other hand, continuous monitoring of traffic patterns helps detect abnormal behavior and malicious activities, thereby improving the ability to detect, prevent, and respond to network security incidents. Therefore, IoT access device type identification is a fundamental capability for achieving dynamic supervision and security governance of IoT networks.

[0004] Currently, in the research and application of traffic identification for IoT devices, deep learning-based methods have achieved high recognition accuracy in specific datasets and scenarios due to their automatic feature learning capabilities. However, in real network environments, IoT devices exhibit continuous dynamic access and constantly expanding categories, and existing mainstream methods still have limitations: On the one hand, they are highly dependent on samples. Existing models typically rely on a large amount of labeled data for training, but in actual deployments, newly accessed devices often only obtain a small number of effective traffic samples in the initial stage, making it difficult to support sufficient training and stable convergence of deep models, thus leading to unstable or unusable recognition performance; on the other hand, they have poor scalability and are prone to catastrophic forgetting. When the system needs to support new device categories, it often needs to acquire all historical data to reconstruct the model or retrain it for a long time. Otherwise, while learning the characteristics of new devices, the model will significantly reduce the recognition accuracy of existing devices, making it difficult to meet the long-term stable operation requirements in continuously evolving network environments.

[0005] To achieve dynamic monitoring and security management of IoT networks, and for scenarios where device types are dynamically accessed and the number of newly added category samples is limited, this invention proposes a deep learning identification method based on metric learning, prototype networks, and a hybrid attention mechanism. This method uses packet-level features and payload to perform scalable identification of IoT device types, evaluates the classification results of the model based on multiple evaluation metrics, and updates network parameters to ensure that the model exhibits good scalability when facing a limited number of new category samples. Summary of the Invention

[0006] To address the problems of existing identification methods in scenarios with continuous and dynamic access of device categories, such as strong dependence on a large number of labeled samples, high cost of model expansion and updates, and susceptibility to catastrophic forgetting leading to a decline in the recognition rate of existing categories, this invention proposes a scalable IoT device traffic identification method and device for limited samples. By constructing an incremental feature learning architecture, the model can achieve efficient expansion and identification of new device categories when there are only a few labeled traffic samples for new categories, while effectively preserving the recognition accuracy of existing devices.

[0007] This invention first acquires IoT device traffic data for preprocessing and feature processing, constructing a multi-feature time-series sample set from different feature information and dividing it into a base class dataset and a newly added class dataset. Next, a metric learning model with a hybrid attention mechanism is constructed and trained to obtain a basic feature extractor. Then, a prototype update attention network is built based on a multi-head self-attention mechanism, and the model is pre-trained using a random pseudo-incremental training strategy. Finally, the model is trained using newly added class samples and their labels, and test samples are input into the trained classification model. Based on the experimental results of IoT device recognition under different small-sample incremental learning conditions, the network model is adaptively updated, exhibiting good scalability when facing a limited number of new class samples. Compared to existing recognition schemes that rely on large amounts of data or frequent full retraining, this invention, through a joint mechanism of "metric embedding" and "prototype adaptive update," transforms the adaptation process of new categories into a controllable update of class prototypes. This allows the model to maintain a stable discrimination boundary even when only a small number of labeled traffic samples are available for newly connected devices, mitigating catastrophic forgetting and improving scalability and deployability at the mechanism level.

[0008] To achieve the above objectives, the technical solution adopted by the present invention is as follows:

[0009] A scalable IoT device traffic identification method for limited samples includes the following steps:

[0010] (1) Obtain the IoT device traffic dataset, perform data preprocessing, extract the data packet level features and payload information of shorter traffic segments, construct the data packet level features and payload information into a multi-feature IoT traffic time series sample set, divide the samples into two categories: base class and new category, and divide the training set and test set.

[0011] (2) Use triplet network to build metric learning model, add hybrid attention mechanism to subnetwork of triplet network, construct triplet sample input metric learning model from base class training samples, and obtain basic feature extractor after learning;

[0012] (3) Use a multi-head self-attention mechanism to build a prototype update attention network, and combine the basic feature extractor trained in step (2) with the prototype update attention network and the classification layer to build a deep learning model. Use a random pseudo-incremental training strategy to input the base class samples into the deep learning model for pre-training.

[0013] (4) Use the newly added category samples and their labels to train and update the deep learning model pre-trained in step (3), and input the test samples into the trained deep learning model to classify IoT device traffic.

[0014] Furthermore, step (1) specifically includes the following sub-steps:

[0015] (1.1) Process the IoT device traffic dataset, distinguish the traffic of each device type according to the MAC address, and add tags to the traffic;

[0016] (1.2) Extract packet-level features and payload information from shorter traffic segments;

[0017] (1.3) For each type of feature information extracted, a time series sequence is constructed according to the time dimension. Different feature sequences of the same data packet are superimposed into a two-dimensional arrangement to construct multiple multivariate feature time series sample sets.

[0018] (1.4) Divide the categories of multiple multi-feature time series samples into base class and newly added category device, and divide all feature sample sets into training set and test set.

[0019] Furthermore, in step (2), the sub-network includes two global attention modules, each of which is followed by a one-dimensional convolutional network to form an attention network. Each convolutional layer is followed by a batch normalization layer and an activation function, and the sub-network ends with a global pooling layer. The global attention module includes a cascaded channel attention and spatial attention module.

[0020] Furthermore, step (2) specifically includes the following sub-steps:

[0021] (2.1) Construct a metric learning model. The model consists of three sub-networks with shared parameters and identical structures. Anchor samples are divided from the base class training samples for each class, and the corresponding samples of the same class and samples of different classes are selected to form triplet samples as input to the triplet network.

[0022] (2.2) The input sample first enters the channel attention module in the sub-network, and then passes through two fully connected layers to learn the weights of different channel information. The ReLU activation function is used for non-linear processing between the two fully connected layers, and the normalized weights are generated by the Sigmoid activation function. The channel attention weights are then applied to the sample data by multiplication.

[0023] (2.3) The new data after being processed by the channel attention module is fed into the spatial attention module. Two convolutional layers are used to fuse spatial information. Each convolutional layer is followed by a batch normalization layer and a ReLU activation function for processing. Finally, the normalized weights are generated by the Sigmoid activation function and then multiplied and weighted onto the sample data to complete the data weight update.

[0024] (2.4) The model learns the complex interrelationships between multi-feature time-series data in the input samples and uses the distance between samples to optimize and update the sub-network parameters;

[0025] (2.5) Fix the parameters of any trained sub-network and use it as the basic feature extractor for the input samples.

[0026] Furthermore, step (3) specifically includes the following sub-steps:

[0027] (3.1) Use the trained basic feature extractor to extract embedding representations for each type of IoT device traffic in the dataset, calculate the mean of the embedding representations for each type of device as the prototype of that type, and use it for subsequent prototype updates;

[0028] (3.2) Construct a prototype update attention network. The network includes two attention heads, each of which contains three linear projections: query, key, and value. The prototypes of multiple existing categories are superimposed to form a matrix. The zero vector of the column of the number of new categories is added to the end of the matrix to obtain the old category prototype matrix. The zero column of the old category prototype matrix is ​​replaced with the prototype of the new category to obtain the prototype matrix to be updated. The old category prototype matrix is ​​input into the linear projection of key and value respectively to perform feature transformation to obtain the key matrix and value matrix. The prototype matrix to be updated is input into the query linear projection to obtain the query matrix.

[0029] (3.3) Transpose the query matrix and perform a dot product operation with the key matrix to learn the relationship between prototypes and output the relevance score matrix;

[0030] (3.4) Dimensionally scale the correlation score matrix and then normalize the matrix using the Sigmoid activation function to complete the calculation of the transition coefficient;

[0031] (3.5) The transition coefficients are weighted onto the value matrix by multiplication to complete the weight update of the prototype in the current attention head;

[0032] (3.6) The prototype representations obtained by each attention head are concatenated, and the concatenated matrix is ​​input into a linear projection layer for linear transformation. The results output by each attention head are integrated to learn information at different levels between prototypes and complete the overall update of the prototype matrix.

[0033] (3.7) Randomly sample each class of samples in the base class training set, divide it into support set and query set, and randomly sample the samples in the support set to repeat the sub-steps (3.1)-(3.6). Between (3.1) and (3.2), construct pseudo base class and pseudo increment class, randomly sample and divide the base class sample set into pseudo base class and pseudo increment class, remove the prototypes of the pseudo increment class from the base class prototypes obtained through sub-step (3.1), and use the remaining prototypes as pseudo base class. Repeat the above operation multiple times.

[0034] (3.8) Select category samples that have appeared in the training phase from the query set and input them into the classification layer for classification test. The classification layer includes a fully convolutional network with two convolutional layers. After the first convolutional layer, a batch normalization layer, a ReLU activation function and a dropout layer are connected. Then, the Softmax function is used to obtain the classification result.

[0035] (3.9) Repeat sub-steps (3.7) and (3.8) to perform pre-training and continuously update the model parameters.

[0036] Furthermore, step (4) specifically includes the following sub-steps:

[0037] (4.1) Input the base class samples and newly added class samples from the above multi-feature time series sample set into the model, train the model using training samples and labels, and update the corresponding prototype and network parameters;

[0038] (4.2) Select all category samples that have appeared in the training phase from the test samples and input them into the trained classification model to achieve scalable multi-classification of IoT device traffic.

[0039] Furthermore, it also includes: evaluating the classification results of the model based on multiple indicators and its scalability when facing a limited number of new class samples.

[0040] The present invention also provides a scalable IoT device traffic identification device for limited samples, including a memory, a processor, and a computer program stored in the memory, wherein the processor executes the computer program to implement the steps of the scalable IoT device traffic identification method for limited samples provided by the present invention.

[0041] Compared with the prior art, the present invention has the following advantages and beneficial effects:

[0042] 1. This invention possesses comprehensive feature representation capabilities. By introducing a hybrid attention mechanism with a serial structure and a triplet metric learning network, this invention can collaboratively capture the deep correlation of traffic features in the channel and spatial dimensions, thereby enhancing the discriminativeness of traffic representation of different devices in scenarios with limited traffic samples, making the model's recognition accuracy better than existing methods in scenarios with few samples.

[0043] 2. This invention achieves an efficient incremental update mechanism. It constructs a prototype update attention network based on the prototype network, transforming the access of new devices from "global parameter retraining / reconstruction" to "local incremental update of the prototype space". Through multi-head self-attention, it adaptively weights different prototypes and their different positions in the feature space, automatically learns the relationship between prototypes and dynamically adjusts the prototype distribution, so that it can still adapt quickly when there are only a few new class samples, reducing the update computation overhead and response latency.

[0044] 3. This invention effectively alleviates the problem of catastrophic forgetting. It constructs a multi-feature temporal sample set using data packet-level features and payload information. Based on triplet networks and global attention mechanisms, it effectively utilizes the relationships between multi-features and the temporal information of single features to learn discriminative embedding representations while retaining the memory of old class knowledge. In the incremental update stage, a transition coefficient and prototype update adaptation mechanism are introduced to constrain and guide the update direction and magnitude of old class prototype knowledge. This effectively maintains the old class discrimination boundary while learning new device features, thus alleviating the problem of catastrophic forgetting.

[0045] 4. Under most few-sample learning conditions, the accuracy of identifying new IoT devices can still be maintained above 90%, making it suitable for dynamically evolving IoT network management and security supervision scenarios, and enabling continuous device identification and timely supervision. Attached Figure Description

[0046] Figure 1 A schematic diagram illustrating the framework of the scalable IoT device traffic identification method for limited samples provided by the present invention;

[0047] Figure 2 A schematic diagram of a scalable IoT device classification model structure;

[0048] Figure 3 The model's performance on various metrics under different learning conditions; where (a) represents the number of incremental class training samples K=20, (b) represents the number of incremental class training samples K=10, (c) represents the number of incremental class training samples K=5, and (d) represents the number of incremental class training samples K=1.

[0049] Figure 4 The performance of different models and this invention in identifying the scalability of IoT traffic on different performance metrics. Detailed Implementation

[0050] The technical solutions provided by the present invention will be described in detail below with reference to specific embodiments. It should be understood that the following specific embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention.

[0051] This invention proposes a scalable IoT device traffic identification method for limited samples, with a classification framework as follows: Figure 1As shown, the study is divided into four parts. The first part is data preprocessing and feature sample set construction. Specifically, this involves obtaining the UNSW dataset, differentiating traffic from different device types based on MAC addresses, adding labels to the groups, extracting and processing packet-level features and payload information from shorter traffic segments, constructing time-series sequences for each feature based on the extracted features, and superimposing different feature sequences of the same data packets to form a two-dimensional arrangement to construct multiple multivariate feature time-series sample sets. Device categories are divided into base and newly added categories in a 3:1 ratio, and all sample sets are divided into training and test sets in a 4:1 ratio. The second part involves training the basic feature extractor. Specifically, it constructs a triplet network consisting of three neural networks sharing parameters and all using global attention modules. Base class training samples are divided into triplet pairs as input to this network. Each sub-network mainly consists of two global attention modules, each incorporating channel and spatial attention mechanisms. Each global attention module is followed by a one-dimensional convolutional network. Each convolutional layer is followed by a batch normalization layer and an activation function. Normalized weights of the feature maps are obtained through the global attention network to update the input features. The model concludes with a global pooling layer and backpropagation to optimize the sub-networks. The first part describes the process of obtaining basic feature extractors. The second part involves constructing a deep learning model using the basic feature extractor, a prototype update attention network, and a classification layer. Specifically, the basic feature extractor extracts embeddings for each device class and calculates their mean as the prototype for that class. A prototype update attention network consisting of two self-attention heads is constructed. Each attention head includes three linear transformations: query, key, and value, as well as transition coefficient calculation. By integrating the outputs of each attention head, information at different levels between prototypes is learned and updated. The updated prototype is used as the basis for classification and input into the classification layer. The classification layer consists of two convolutional layers: after the first convolutional layer, there is a batch normalization layer, an activation function, and a dropout layer. Finally, a random pseudo-incremental training strategy is used to pre-train the model. The third part describes the scalable multi-classification of IoT device traffic. Specifically, the model is trained using newly added category training samples and labels, and the corresponding prototypes and network parameters are updated. All category samples that appeared during the training phase are selected from the test samples and input into the trained classification model to achieve scalable multi-classification of IoT device traffic. The classification results are compared using various evaluation metrics. The network model is adaptively updated based on the experimental results of IoT device recognition under different small-sample incremental learning conditions, showing good scalability. The classification model structure of this invention is as follows: Figure 2 As shown.

[0052] Specifically, the present invention provides a scalable IoT device traffic identification method for limited samples, comprising the following steps:

[0053] (1) Obtain the Internet of Things traffic dataset UNSW, and perform data preprocessing and feature sample set construction on shorter traffic segments.

[0054] The specific process for this step is as follows:

[0055] (1.1) Process the UNSW dataset, distinguish the traffic of each device type according to the MAC address, add labels to the traffic, and obtain 18 categories of IoT traffic for subsequent verification of classification results;

[0056] (1.2) The first 25 bytes of the payload byte content of each data packet are extracted. The data packet-level features and payload information in the shorter traffic segments are extracted and processed for subsequent sample construction. The feature names and corresponding meanings of the 6 features of the extracted IoT traffic are shown in the table below:

[0057] Feature name Feature meaning Interval Time interval between data packets Protocol Data packet protocol Direction Data packet flow TTL Packet Lifetime Length Data packet payload length Payload Data packet payload bytes content

[0058] (1.3) Set the short traffic segment contained in a single sample to 75 data packets. Based on the extracted feature information, construct a time series sequence for each feature according to the time dimension. At the same time, ensure that the first value of the Interval feature sequence slice after cutting is 0, so that each Interval feature sequence slice is a number sequence starting with 0.

[0059] (1.4) Different feature sequences of the same data packets are superimposed, and the six features of every 75 data packets are arranged in a two-dimensional pattern to construct multiple multi-feature time series sample sets. The two-dimensional structure of the multi-feature IoT traffic time series sample provides a structural foundation for subsequent feature learning. For example, taking 5 data packets as an example, a multi-feature time series sample is obtained as follows:

[0060] Feature name Feature meaning Specific data (taking 5 data packets as an example) Interval Time interval between data packets [0, 0.010482, 0.000115, 0.236551, 8.395514] Protocol Data packet protocol [6.0, 6.0, 6.0, 6.0, 17.0] Direction Data packet flow [-1.0, 1.0, 1.0, -1.0, 1.0] TTL Packet Lifetime [212.0, 64.0, 64.0, 212.0, 64.0] Length Data packet payload length [41.0, 0.0, 41.0, 0.0, 159.0] Payload Data packet payload bytes content [17,3,… , 17,3,… ,0,0,… ,50,4f,… 48,54,…]

[0061] (1.5) The 18 types of multi-feature time series samples are divided into basic categories and newly added categories of devices in a 3:1 ratio. Each category has 500 samples, and these feature sample sets are divided into training sets and test sets in a 4:1 ratio.

[0062] (2) A triplet network model is constructed using a global attention mechanism. Specifically, three sub-networks with shared parameters, each containing a global attention module, are constructed. The triplet sample input channel attention and spatial attention modules are constructed from the base class training samples to obtain the normalized weights of the feature maps. The input features are then updated with weights. A one-dimensional convolutional network is added after each global attention module to form an attention network. Each convolutional layer is followed by a batch normalization layer and an activation function. The sub-network is finally represented by a global pooling layer with output embeddings. After learning, a basic feature extractor is obtained. This step introduces a cascaded channel and spatial attention module. Its innovative intention is that, since the contribution of IoT device traffic features varies in different dimensions, the hybrid attention mechanism allows the model to focus on more discriminative key feature regions, capture the complex interrelationships between different IoT traffic features, and thus construct robust feature encoding embeddings even when the sample size is insufficient. This improves the model's extraction accuracy for heterogeneous traffic features and enhances the model's ability to maintain old knowledge.

[0063] The specific process for this step is as follows:

[0064] (2.1) Construct a metric learning model. The model consists of three neural networks with shared parameters and the same structure. Anchor samples are divided for each category from the base class training samples, and the corresponding samples of the same class and different classes are selected to form triplet samples as input to the triplet network.

[0065] (2.2) Based on (2.1), a hybrid attention mechanism is introduced. Each sub-network consists of two attention networks containing a global attention module. The global attention module includes a cascaded channel attention module and a spatial attention module. The input sample first enters the channel attention module and then passes through two fully connected layers to learn the weights of different channel information. The ReLU activation function is used for non-linear processing between the two fully connected layers. Normalized weights are generated by the Sigmoid activation function, and the channel attention weights are applied to the sample data by multiplication.

[0066] (2.3) Based on (2.2), the new data processed by the channel attention module is further fed into the spatial attention module. Two convolutional layers are used to fuse the spatial relationships between features, and a spatial weight map is generated to achieve precise focusing on key traffic feature regions. Each convolutional layer is followed by a batch normalization layer and a ReLU activation function for processing. Finally, normalized weights are generated by the Sigmoid activation function and then multiplied and weighted onto the sample data to complete the data weight update.

[0067] (2.4) Based on (2.3), a one-dimensional convolutional network is added after each global attention module to form an attention network. The number of filters in the two convolutional layers are 256 and 128 respectively, and the kernel sizes are 5 and 3 respectively. Each convolutional layer is followed by a batch normalization layer and a ReLU activation function.

[0068] (2.5) Based on the model in (2.4), a global pooling layer is added at the end of the network. The entire model learns the complex interrelationships between multi-feature time series data in the input samples and uses the distance between triplet sample pairs to complete the optimization and update of sub-network parameters.

[0069] (2.6) Based on the training in (2.5), fix the parameters of any trained sub-network. This network is used as the basic feature extractor of the input sample and is used for subsequent sample embedding representation extraction to improve the model's ability to maintain old knowledge.

[0070] (3) The pre-trained basic feature extractor is used to extract the embedding representation of each class of samples and calculate its prototype. A prototype update attention network is constructed based on the multi-head self-attention mechanism. Specifically, a prototype update network with two attention heads is constructed. The prototype matrix is ​​input into the network to update the weights of the feature distribution of each class of prototypes. A classification layer is constructed using a fully convolutional network with two convolutional layers. After the first convolutional layer, there is a batch normalization layer, an activation function, and a dropout layer. After the second convolutional layer, the classification result is output. Finally, random pseudo-incremental classes are divided from the base class samples and input into the model for pre-training. The update process in this step realizes the adaptive transfer and gradual update of knowledge by calculating the transition coefficient, thereby replacing full retraining or overlay update with "knowledge fusion constraint". This mechanism uses the relationship between new and old knowledge to support extended recognition, which is the core logic of the scalability of this invention. The pre-training step enables the deep learning model to better generalize to new samples and adapt to a limited number of new class labeled samples.

[0071] The specific process for this step is as follows:

[0072] (3.1) Use the trained basic feature extractor to extract embedding representations for each type of IoT device traffic in the dataset, calculate the mean of the embedding representations for each type of device as the prototype of that type, and use it for subsequent prototype updates;

[0073] (3.2) Construct a prototype update attention network. The network mainly consists of two attention heads, each of which contains three linear projections: query, key, and value. The prototypes of multiple existing categories are superimposed to form a matrix. The zero vector of the column of the number of new categories is added to the end of the matrix to obtain the old category prototype matrix. The zero column of the old category prototype matrix is ​​replaced with the prototype of the new category to obtain the prototype matrix to be updated. The old category prototype matrix is ​​input into the linear projection of key and value respectively to perform feature transformation to obtain the key matrix and value matrix. The prototype matrix to be updated is input into the query linear projection to obtain the query matrix.

[0074] (3.3) Based on (3.2), the matrix information output by the query linear projection is transposed, and the matrix is ​​multiplied by the matrix output by the key linear projection to learn the relationship between the prototypes and output the correlation score matrix.

[0075] In the two steps above, the prototype update attention network achieves knowledge retention by constructing a mapping relationship between the old class prototype matrix and the prototype matrix to be updated; it uses linear projection to convert the old class prototype into a key matrix and a value matrix, and converts the prototype to be updated, which contains the initial representation of the new class, into a query matrix. It calculates the inter-class relevance score by performing a dot product operation between the query matrix and the key matrix, thereby achieving adaptive retrieval and updating of old knowledge.

[0076] (3.4) Based on (3.3), the correlation score matrix is ​​multiplied by the inverse of the 0.5 power of the dimension of the key matrix to scale the correlation score matrix. Then, the Sigmoid activation function is used to normalize the matrix to complete the calculation of the transition coefficient.

[0077] (3.5) Based on (3.4), the transition coefficients are weighted onto the value matrix by multiplication to complete the weight update of the prototype in the current attention head;

[0078] (3.6) Based on (3.5), the prototype representations obtained by each attention head are spliced ​​together, and the spliced ​​matrix is ​​input into a linear projection layer for linear transformation. The attention results of each attention head on different parts of the prototype are integrated, and information at different levels between prototypes is learned to complete the overall update of the prototype matrix.

[0079] In the above two steps, after calculating the relevance score matrix, the present invention generates transition coefficients through scaling factor processing and the Sigmoid function. These coefficients serve as guiding weights for updating the prototype of the new class based on the knowledge of the old class. By applying the transition coefficients to the value matrix with weights, the empirical knowledge of the base class device identification is transferred to the prototype representation of the new device category, thereby mitigating the catastrophic forgetting of the model in the incremental learning process from a mechanism perspective.

[0080] (3.7) Randomly sample each class of samples in the base class training set, divide it into a support set and a query set, and randomly sample samples in the support set to repeat sub-steps (3.1)-(3.6). An operation to construct pseudo-base classes and pseudo-increment classes is added between (3.1) and (3.2). N prototypes of pseudo-increment classes are removed from the base class prototypes obtained in sub-step (3.1), and the remaining prototypes are used as pseudo-base classes. This repeated operation is performed multiple times. This invention employs a random pseudo-increment strategy during the pre-training process; it randomly samples and divides the base class sample set into pseudo-base classes and pseudo-increment classes to simulate dynamic device access scenarios in a real environment; by continuously executing prototype extraction, correlation learning, and prototype update cycles in multiple training batches, it improves the model's generalization ability and scalability for newly added categories containing only limited labeled samples.

[0081] (3.8) Based on (3.7), samples of categories that have appeared in the training phase are selected from the query set and input into the classification layer for classification testing. The classification layer consists of a fully convolutional network with two convolutional layers. The filters in each layer are the number of dimensions of the sample embedding representation and 1, respectively. The kernel size is 1. After the first convolutional layer, a batch normalization layer, a ReLU activation function and a dropout layer with a dropout rate of 0.5 are connected. Then, the Softmax function is used to obtain the classification result.

[0082] (3.9) Repeat sub-steps (3.7) and (3.8) to perform pre-training, and continuously update the model parameters using the above-mentioned random pseudo-incremental training strategy to improve the scalability of the model.

[0083] (4) The model is trained and updated using newly added category samples and their labels. The test samples are input into the trained classification model to achieve multi-classification of IoT device traffic. Accuracy, precision, recall and F1-score are used as evaluation indicators. The performance of each evaluation indicator of multiple models after the update is compared with the decline of the initial learning stage. The scalability of the present invention in the face of limited new category samples is analyzed.

[0084] This step specifically includes the following processes:

[0085] (4.1) The above IoT traffic dataset is divided into basic categories and new categories. Among them, four IoT device types, namely LiFXSmart Bulb, Triby Speaker, Withings Aura Smart Sleep Sensor and Withings Smart BabyMonitor, are new categories, while the remaining 14 types are basic categories.

[0086] (4.2) In the above multi-feature time series sample set, the dataset data is in the form of (B, L, C), where B is the number of data samples in the input dataset, L is the maximum number of input data packets set, and C is the number of features extracted from each data packet. This multi-feature time series data is used as the input data for the subsequent deep learning model.

[0087] (4.3) Use the newly added category of equipment training samples and labels to train the pre-trained classification model in step (3), and update the learned network parameters and prototype representation;

[0088] (4.4) Select all category samples that appeared during the training phase from the test samples and input them into the trained classification model to determine the type of IoT device to which the traffic belongs. The types include: Nest Dropcam, Samsung SmartCam, TP-Link Day Night Cloud Camera, Withings Smart Baby Monitor, Insteon Camera, Netatmo Welcome, Smart Things, Amazon Echo Hub, Netatmo Weather Station, TribySpeaker, PIX-STAR Photo-frame, HP Printer, Withings Aura Smart Sleep Sensor, LiFXSmart Bulb, iHome Power Plug, TP-Link Smart Plug, Belkin Wemo Motion Sensor, Belkin Wemo Switch, a total of 18 IoT device types;

[0089] (4.5) Set the number of device incremental categories used for model updates and testing to N, and the number of training samples for these incremental categories to K. Evaluate the impact of different number of updates caused by different numbers of incremental categories N on the overall accuracy of the model on the dataset when K takes values ​​of 20, 10, 5, and 1 respectively. See the specific comparison results below. Figure 3 .

[0090] The test group numbers in the figure represent the sequence numbers of the three repeated experiments conducted in this section. By comparing the results of multiple test groups, it can be seen that when N is 4, the model with only one update achieves a higher recognition accuracy on the overall dataset than the model with four updates when N is 1. When the number of incremental category training samples K is 20, even after four updates, the model's recognition accuracy for all categories of device traffic remains at around 90.00%, still reaching 89.39%. In summary, as the number of model updates increases, the distribution shift of the prototype representation increases, thus the model will forget some of the old knowledge. With the support of an appropriate number of training samples, the model performs well in recognizing the overall traffic categories during model updates, can resist the forgetting of old knowledge to a certain extent, and exhibits good scalability. Specific experimental results for this model can be found in [link to specific experimental results]. Figure 4 .

[0091] The present invention also provides a scalable IoT device traffic identification device for limited samples, including a memory, a processor, and a computer program stored in the memory. The processor executes the computer program to implement the steps of the scalable IoT device traffic identification method for limited samples provided by the present invention.

[0092] It should be noted that the above content merely illustrates the technical concept of the present invention and should not be construed as limiting the scope of protection of the present invention. For those skilled in the art, various improvements and modifications can be made without departing from the principle of the present invention, and all such improvements and modifications fall within the scope of protection of the claims of the present invention.

Claims

1. A scalable IoT device traffic identification method for limited samples, characterized in that, Includes the following steps: (1) Obtain the IoT device traffic dataset, perform data preprocessing, extract the data packet level features and payload information of shorter traffic segments, construct the data packet level features and payload information into a multi-feature IoT traffic time series sample set, divide the samples into two categories: base class and new category, and divide the training set and test set. (2) Use triplet network to build metric learning model, add hybrid attention mechanism to subnetwork of triplet network, construct triplet sample input metric learning model from base class training samples, and obtain basic feature extractor after learning; (3) Use a multi-head self-attention mechanism to build a prototype update attention network, and combine the basic feature extractor trained in step (2) with the prototype update attention network and the classification layer to build a deep learning model. Use a random pseudo-incremental training strategy to input the base class samples into the deep learning model for pre-training. (4) Use the newly added category samples and their labels to train and update the deep learning model pre-trained in step (3), and input the test samples into the trained deep learning model to classify IoT device traffic.

2. The scalable IoT device traffic identification method for limited samples according to claim 1, characterized in that, Step (1) specifically includes the following sub-steps: (1.1) Process the IoT device traffic dataset, distinguish the traffic of each device type according to the MAC address, and add tags to the traffic; (1.2) Extract packet-level features and payload information from shorter traffic segments; (1.3) For each type of feature information extracted, a time series sequence is constructed according to the time dimension. Different feature sequences of the same data packet are superimposed into a two-dimensional arrangement to construct multiple multivariate feature time series sample sets. (1.4) Divide the categories of multiple multi-feature time series samples into base class and newly added category device, and divide all feature sample sets into training set and test set.

3. The scalable IoT device traffic identification method for limited samples according to claim 1, characterized in that, In step (2), the sub-network includes two global attention modules. Each global attention module is followed by a one-dimensional convolutional network to form an attention network. Each convolutional layer is followed by a batch normalization layer and an activation function. The sub-network ends with a global pooling layer. The global attention module includes a cascaded channel attention and spatial attention module.

4. The scalable IoT device traffic identification method for limited samples according to claim 3, characterized in that, Step (2) specifically includes the following sub-steps: (2.1) Construct a metric learning model. The model consists of three sub-networks with shared parameters and identical structures. Anchor samples are divided from the base class training samples for each class, and the corresponding samples of the same class and samples of different classes are selected to form triplet samples as input to the triplet network. (2.2) The input sample first enters the channel attention module in the sub-network, and then passes through two fully connected layers to learn the weights of different channel information. The ReLU activation function is used for non-linear processing between the two fully connected layers, and the normalized weights are generated by the Sigmoid activation function. The channel attention weights are then applied to the sample data by multiplication. (2.3) The new data after being processed by the channel attention module is fed into the spatial attention module. Two convolutional layers are used to fuse spatial information. Each convolutional layer is followed by a batch normalization layer and a ReLU activation function for processing. Finally, the normalized weights are generated by the Sigmoid activation function and then multiplied and weighted onto the sample data to complete the data weight update. (2.4) The model learns the complex interrelationships between multi-feature time-series data in the input samples and uses the distance between samples to optimize and update the sub-network parameters; (2.5) Fix the parameters of any trained sub-network and use it as the basic feature extractor for the input samples.

5. The scalable IoT device traffic identification method for limited samples according to claim 1, characterized in that, Step (3) specifically includes the following sub-steps: (3.1) Use the trained basic feature extractor to extract embedding representations for each type of IoT device traffic in the dataset, calculate the mean of the embedding representations for each type of device as the prototype of that type, and use it for subsequent prototype updates; (3.2) Construct a prototype update attention network. The network includes two attention heads, each of which contains three linear projections: query, key, and value. The prototypes of multiple existing categories are superimposed to form a matrix. The zero vector of the column of the number of new categories is added to the end of the matrix to obtain the old category prototype matrix. The zero column of the old category prototype matrix is ​​replaced with the prototype of the new category to obtain the prototype matrix to be updated. The old category prototype matrix is ​​input into the linear projection of key and value respectively to perform feature transformation to obtain the key matrix and value matrix. The prototype matrix to be updated is input into the query linear projection to obtain the query matrix. (3.3) Transpose the query matrix and perform a dot product operation with the key matrix to learn the relationship between prototypes and output the relevance score matrix; (3.4) Dimensionally scale the correlation score matrix and then normalize the matrix using the Sigmoid activation function to complete the calculation of the transition coefficient; (3.5) The transition coefficients are weighted onto the value matrix by multiplication to complete the weight update of the prototype in the current attention head; (3.6) The prototype representations obtained by each attention head are concatenated, and the concatenated matrix is ​​input into a linear projection layer for linear transformation. The results output by each attention head are integrated to learn information at different levels between prototypes and complete the overall update of the prototype matrix. (3.7) Randomly sample each class of samples in the base class training set, divide it into support set and query set, and randomly sample the samples in the support set to repeat the sub-steps (3.1)-(3.6). Between (3.1) and (3.2), construct pseudo base class and pseudo increment class, randomly sample and divide the base class sample set into pseudo base class and pseudo increment class, remove the prototypes of the pseudo increment class from the base class prototypes obtained through sub-step (3.1), and use the remaining prototypes as pseudo base class. Repeat the above operation multiple times. (3.8) Select category samples that have appeared in the training phase from the query set and input them into the classification layer for classification test. The classification layer includes a fully convolutional network with two convolutional layers. After the first convolutional layer, a batch normalization layer, a ReLU activation function and a dropout layer are connected. Then, the Softmax function is used to obtain the classification result. (3.9) Repeat sub-steps (3.7) and (3.8) to perform pre-training and continuously update the model parameters.

6. The scalable IoT device traffic identification method for limited samples according to claim 1, characterized in that, Step (4) specifically includes the following sub-steps: (4.1) Input the base class samples and newly added class samples from the above multi-feature time series sample set into the model, train the model using training samples and labels, and update the corresponding prototype and network parameters; (4.2) Select all category samples that have appeared in the training phase from the test samples and input them into the trained classification model to achieve scalable multi-classification of IoT device traffic.

7. The scalable IoT device traffic identification method for limited samples according to claim 1, characterized in that, Also includes: The classification results of the model are evaluated based on multiple indicators, and its scalability is assessed when dealing with a limited number of new class samples.

8. A scalable IoT device traffic identification device for limited samples, comprising a memory, a processor, and a computer program stored in the memory, characterized in that, The processor executes the computer program to implement the steps of the scalable IoT device traffic identification method for finite samples as described in any one of claims 1-7.