Lightweight traffic identification method and system for PCDN resource abuse

CN122268784APending Publication Date: 2026-06-23NANJING UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
NANJING UNIV OF POSTS & TELECOMM
Filing Date
2026-05-28
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing PCDN traffic identification methods suffer from high model complexity, high computational resource consumption, and difficulty in deploying on edge devices such as routers for real-time classification. Furthermore, methods based on deep packet inspection may involve user privacy information, while methods based on deep learning are highly dependent on server-side processing and cannot achieve low-latency identification at the network edge.

Method used

A lightweight CNN classification model is adopted. PCDN software traffic is captured by mobile terminal packet capture tools and PC-side network protocol analysis tools. Predefined network flow features are designed, key features are selected using a tree model, a lightweight CNN classification model is constructed, and it is converted into C language code and embedded into the router firmware. Real-time classification is then performed in conjunction with feature extraction tools.

Benefits of technology

It achieves accurate identification of PCDN software traffic on edge devices such as routers, reduces the resource consumption and complexity of model operation, ensures the practicality of network communication functions, and continuously optimizes model performance through dataset playback and real-world scenario testing.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122268784A_ABST
    Figure CN122268784A_ABST
Patent Text Reader

Abstract

The application discloses a kind of light flow identification methods and systems for PCDN resource abuse, belong to network flow identification technical field.Method includes: collection PCDN flow data, extract network stream features and filter key features;Light CNN classification model is constructed and trained;The model trained is converted into C language code, solidification and embedding in router firmware, cooperate with feature processing module to realize real-time flow classification;Through dataset playback and real environment test, the performance of the model is evaluated and optimized.The application realizes light real-time flow identification on edge device such as router, provides effective monitoring means for PCDN resource abuse.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of network traffic identification technology, specifically to a lightweight traffic identification method and system for addressing PCDN resource abuse. Background Technology

[0002] PCDN (Peer-to-Peer Content Delivery Network) technology has been widely used in content distribution, particularly in optimizing video streaming transmission and reducing bandwidth costs. PCDN combines the advantages of P2P technology and traditional CDN, building a low-cost content delivery network service by utilizing the massive fragmented idle resources of edge networks. This technology allows users to directly obtain the content they need from each other's computers, reducing the need for access to central servers. With the advancement of internet technology and the increasing demand for large-scale, multimedia content, PCDN provides an effective solution to address the challenges faced by traditional CDNs in scenarios such as streaming media, online games, and large file transfers.

[0003] Currently, research on PCDN traffic identification mainly focuses on deep learning and DPI (Device-Based Interface) technology. Deep learning-based traffic identification methods utilize deep neural networks to automatically learn traffic features, eliminating the reliance on feature engineering in traditional methods, and are often used for identifying encrypted traffic. However, they face problems such as strong data dependence, high annotation costs, high model complexity, and limited generalization ability, leading to unstable performance. Traditional methods are not flexible enough when handling encrypted traffic and complex and ever-changing P2P protocols, and for newly emerging P2P applications, the feature database needs to be constantly updated, increasing maintenance costs and complexity. DPI-based methods examine the payload content of data packets, using string matching or regular expression matching to determine whether it contains known P2P protocol feature strings, known as application layer signatures. While highly accurate, these methods consume significant computer resources and storage space during deep analysis of data packets and may involve user privacy information. Summary of the Invention

[0004] The technical problem this invention aims to solve is to address the issues of high model complexity, high computational resource consumption, and difficulty in deploying PCDN traffic identification methods on edge devices such as routers for real-time classification in existing technologies; and to address the problems of deep packet inspection-based methods potentially involving user privacy information, and deep learning-based methods being highly dependent on server-side processing and unable to achieve low-latency traffic identification at the network edge. To solve these problems, this invention adopts the following technical solution:

[0005] First, this invention proposes a lightweight traffic identification method for PCDN resource abuse, comprising the following steps:

[0006] S1. Capture PCDN software traffic using mobile terminal packet capture tools and PC-side network protocol analysis tools and export it as a PCAP file; extract the quintuple and application name information from the PCAP file, and after data cleaning, divide the PCAP file according to the correspondence between the quintuple and the application name;

[0007] S2. Design predefined network flow features; extract the network flow features using a feature extraction tool for the PCAP files of each PCDN software; use a tree model to select key features.

[0008] S3. Construct a lightweight CNN classification model, which consists of three convolutional modules and three fully connected modules. Each convolutional module contains a convolutional layer, a ReLU activation function, and a Dropout regularization layer.

[0009] S4. Randomly divide the dataset into a training set and a test set according to a preset ratio; use the training set to train the CNN classification model for a preset number of rounds; after training, use the test set to evaluate the model performance;

[0010] S5. Convert the trained CNN model into C language code, solidify it as an inference module, and embed it into the router's firmware; embed the feature extraction tool into the router as a feature processing module;

[0011] S6. Test the deployed model using both dataset replay and real-world environment methods. Evaluate model performance using preset performance metrics, optimize the model based on the evaluation results, and output lightweight traffic identification results.

[0012] Preferably, step S1 specifically includes the following steps:

[0013] S101. At different times, the PCDN software is run on the mobile terminal to generate traffic. At the same time, the mobile terminal packet capture tool is opened to capture the traffic and export it as a PCAP file. On the PC, the network protocol analysis tool is used to capture the traffic packets mirrored on the router's LAN port and export them as a PCAP file.

[0014] S102. Extract the five-tuple information from the PCAP file exported by the mobile terminal packet capture tool. The five-tuple includes source IP, destination IP, source port, destination port, and protocol type. Clean the data and remove incomplete, damaged, or incorrectly formatted data packets.

[0015] S103. Based on the correspondence between the quintuple and the application name, divide the PC-exported image PCAP files. Each individual PCAP file corresponds to the traffic of a PCDN software and is named after that PCDN software.

[0016] Preferably, step S102, which extracts the quintuple information from the PCAP file exported by the mobile terminal packet capture tool, specifically includes the following steps:

[0017] S1021. Traverse the PCAP file exported by the mobile terminal packet capture tool, use the dpkt library to parse the data packets in the file, extract the five-tuple information and extract the application name information from the end of the data packet;

[0018] S1022. Obtain the last 32 bytes from the original payload of the data packet, extract the bytes corresponding to the application name from bytes 8 to 28, decode them according to UTF-8 encoding and remove null characters and spaces at both ends to obtain the application name;

[0019] S1023. Store the quintuple and the corresponding application name in a list as the basis for dividing the mirror PCAP file.

[0020] Preferably, step S6 specifically includes the following steps:

[0021] S601. In the dataset replay scenario, traffic replay is performed using pre-collected labeled PCAP files. The classification results are obtained through the feature processing module and the inference module, and the model performance is evaluated using accuracy, recall, and F1 score.

[0022] S602. In real-world scenarios, real traffic is generated by mobile terminals and classified in real time by routers. The real labels are compared with the predicted labels to evaluate the model performance.

[0023] S603. Based on the evaluation results, identify the strengths and weaknesses of the model performance, propose optimization suggestions, and repeat steps S1 to S6 to adjust and retrain the model.

[0024] Preferably, step S602 specifically includes the following steps:

[0025] S6021. Configure a packet capture tool on the mobile terminal to capture PCDN traffic as the real label source, and at the same time open a port on the PC to receive the classification results output by the router via UDP protocol.

[0026] S6022. Parse the PCAP file captured by the packet capture tool to obtain a list containing quintuples and real labels;

[0027] S6023. Parse the classification results received by the PC and obtain a list containing quintuples and predicted labels;

[0028] S6024. Traverse the two lists and perform stream matching using quintuples to obtain the correspondence between each predicted label and the true label.

[0029] S6025. Based on the matching results, evaluate the model performance using accuracy, recall, and F1 score.

[0030] Preferably, in step S4, the training set accounts for 90% of the dataset, and the test set accounts for 10%; the CNN classification model is trained for 100 rounds.

[0031] Meanwhile, this invention proposes a lightweight traffic identification system for PCDN resource abuse, comprising:

[0032] The training module is configured to: acquire PCAP files of PCDN software traffic using a packet capture tool; segment and label the PCAP files based on quintuples and application names; extract predefined network flow features from the PCAP files; use a tree model to filter key features; and build and train a lightweight CNN classification model.

[0033] The deployment module, configured to be integrated into the router, includes: an inference module, which contains C language code derived from a trained CNN classification model for real-time classification of network traffic; and a feature processing module, which consists of embedded feature extraction tools for traffic capture, flow segmentation, and flow feature extraction.

[0034] The testing and evaluation module is configured to test the deployed model through both dataset replay and real-world environments, evaluate model performance using preset performance metrics, and output traffic identification results.

[0035] Preferably, in the training module, there are 72 predefined network flow features. The tree model is trained using the 72 features, and the 49 features with the highest scores are selected as key features based on the feature importance scores.

[0036] Furthermore, the present invention proposes a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the method described in the present invention.

[0037] Finally, the present invention proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the computer program is executed, it implements the method described in the present invention.

[0038] The present invention, by adopting the above technical solution, has the following beneficial effects:

[0039] (1) The present invention collects training data in multiple steps, designs and extracts traffic features, and builds a lightweight CNN classification model for training. It can accurately identify PCDN software traffic, effectively distinguish different traffic, and help network traffic management to be more targeted.

[0040] (2) This invention uses a feature selection algorithm to select key features, thereby achieving model lightweighting. While ensuring recognition accuracy, it reduces the resource consumption and complexity of model operation, making the model easier to deploy and apply.

[0041] (3) The present invention converts the trained model into a file suitable for embedded hardware and deploys it on edge devices such as routers. With the feature processing module and the inference module, it can realize real-time classification and prediction of network traffic, ensure the normal network communication function of the router, and enhance the practicality of network devices.

[0042] (4) Conduct comprehensive testing, evaluation and optimization in dataset playback and real-world scenarios. Accurately measure model performance through multi-step operations, analyze advantages and disadvantages to facilitate continuous optimization of the model, so that it can better adapt to actual application scenarios and continuously improve performance. Attached Figure Description

[0043] Figure 1 This invention relates to an architecture diagram of a lightweight traffic identification method for PCDN resource abuse. Detailed Implementation

[0044] The technical solution of the present invention will be described in detail below with reference to the accompanying drawings.

[0045] Example 1: Reference Figure 1 This embodiment achieves efficient identification and classification of PCDN software traffic by constructing and deploying a lightweight convolutional neural network model. The specific implementation steps are as follows:

[0046] Step S1: Collect training data.

[0047] S101: At different time periods, the mobile terminal runs PCDN software to generate PCDN software traffic. Simultaneously, a mobile terminal packet capture tool is used to capture the traffic on the mobile terminal side, and the captured results are exported as a PCAP file. On the PC side, a network protocol analysis tool is used to capture the router's LAN port mirrored traffic, and the captured results are exported as a PCAP file. Specifically, the mobile terminal packet capture tool (e.g., pcapdroid) is used to capture network packets generated by a specified application on the mobile terminal side and export them as PCAP format files; the network protocol analysis tool (e.g., Wireshark) is used on the PC side to collect, parse, and save network packets output from the router's mirror port.

[0048] S102: Extract the 5-tuple information of the PCDN software from the PCAP file exported by pcapdroid and store it in a list. The 5-tuple includes source IP, destination IP, source port, destination port, and protocol type. Clean the data, removing any incomplete, corrupted, or malformed packets.

[0049] S1021: Traverse the PCAP file captured by the mobile terminal packet capture tool, and use a packet parsing library to parse the packets in the PCAP file packet by packet, extracting the IP 5-tuple information and the application name information at the end of the packet. The packet parsing library can be the dpkt library, which is used to read and parse PCAP format files and extract network layer and transport layer field information from the packets.

[0050] S1022: Take the last 32 bytes of data from the original payload of the data packet, then extract the bytes corresponding to the application name from bytes 8 to 28 of this part of the data, and finally decode it according to UTF-8 encoding (ignore decoding errors), remove the null characters at both ends, and replace the spaces with empty strings to obtain the final application name.

[0051] S1023: Store the quintuple and its corresponding application name in a separate list as the basis for dividing the mirrored PCAP file.

[0052] S103: Based on the correspondence between the 5-tuple and the application name, divide the mirrored PCAP files. Each individual PCAP file represents the traffic of a PCDN software and is named using the PCDN software name.

[0053] Step S2: Extract traffic characteristics.

[0054] S201: Design a set of network flow characteristics, including flow 5-tuples, TCP Slide Window, TLS handshake packet information, packet length sequence, packet arrival time, flow length correlation, and flow duration, without containing any user privacy data. This will be used to analyze network flow behavior and patterns for more accurate traffic identification.

[0055] S202: For each PCDN software's PCAP file, use a traffic feature extraction tool to extract features and store them in a feature database for subsequent model training.

[0056] S203: Use a tree model to evaluate the importance of the network flow features, and select key features based on the evaluation results. This includes the following steps:

[0057] S2031: Using the network flow features extracted in step S202 as input and the corresponding PCDN software category labels as output, construct and train a tree model classifier; wherein, the tree model can be a decision tree, random forest, or gradient boosting decision tree.

[0058] S2032: After the tree model is trained, the network flow features are sorted according to the feature importance scores output by the tree model; the feature importance scores are used to characterize the degree of contribution of different features to the classification results.

[0059] S2033: Select 49 features as key features according to their feature importance scores from high to low, and use these key features as input to the subsequent lightweight CNN classification model to reduce the dimensionality of the input features and reduce the model inference overhead.

[0060] Step S3: Construct a lightweight CNN classification model, which consists of three convolutional modules and three fully connected modules. Each convolutional module consists of a convolutional layer, an activation function (ReLU), and a Dropout regularization layer, designed to identify PCDN software traffic.

[0061] Step S4: Train the designed model.

[0062] S401: The entire dataset is randomly divided into a training set and a test set, with the training set accounting for 90% and the test set accounting for 10%.

[0063] S402: The training set is used to train the model. The model is optimized by inputting the training data and undergoes 100 rounds of training to obtain the trained model.

[0064] S403: After model training is complete, the model is evaluated using a test set. Test data is input into the trained model to obtain performance metrics. Finally, the model's performance is evaluated using evaluation metrics to determine its performance on the test data.

[0065] Step S5, module deployment.

[0066] S501: The trained lightweight CNN classification model is converted into inference code suitable for embedded hardware and deployed as an inference module on the router. The lightweight CNN classification model is built and trained based on the Keras deep learning framework, which is used for CNN network architecture construction, parameter training, and model storage. After training, the CNN classification model is converted into C language code using an embedded AI conversion tool and embedded into the router firmware to achieve real-time traffic classification and prediction on the router side.

[0067] S502: The traffic feature extraction tool is embedded into the router as a feature processing module. The feature processing module includes three functions: traffic capture, flow segmentation, and automatic flow feature extraction.

[0068] Step S6: Conduct testing and optimization. Specifically, conduct testing, evaluation, and optimization in two different test scenarios: a dataset playback scenario and a real-world scenario.

[0069] S601: In the dataset replay scenario, traffic replay is performed using pre-collected labeled PCAP files. The router's feature processing and inference modules then yield the model classification results. The accuracy of the model classification is calculated, and metrics such as precision, recall, and F1 score are used to measure the model's performance.

[0070] S602: In real-world scenarios, mobile terminals generate real traffic, which is then classified in real time using a router. The model performance is then evaluated based on the prediction results.

[0071] S6021: The mobile terminal is configured with pcapdroid to capture PCDN traffic as the source of the actual labels. Simultaneously, the PC opens a port to receive the router's traffic classification results via UDP protocol.

[0072] S6022: Parses the PCAP file captured by pcapdroid to obtain a list containing [pentatuples, true labels], preparing for subsequent model prediction and label matching.

[0073] S6023: Analyze the model classification results received by the PC and obtain a list containing [pentatuples, predicted labels] for subsequent label matching and model evaluation.

[0074] S6024: Iterate through the two lists, using quintuples for stream matching to obtain the corresponding article for each predicted label and the true label. Ensure the accuracy of the matching process to properly evaluate the model's performance.

[0075] S6025: Calculate the accuracy of the model's predictions using the matched results, and use metrics such as precision, recall, and F1 score to measure the model's performance.

[0076] S603: Based on the evaluation results obtained in steps S601 and S602, analyze the classification performance of the lightweight CNN classification model on different PCDN software traffic. When the model's accuracy, recall, or F1 score does not meet the preset requirements, adjust the training data, key feature set, or model parameters, and re-execute the data acquisition, feature extraction, model training, model deployment, and testing evaluation processes to obtain a lightweight traffic identification model that meets the deployment requirements of edge devices such as routers.

[0077] During the testing and evaluation process, the system used PCAP files captured by mirroring the router's LAN port as the input data to be identified, and PCAP files captured by the mobile terminal as the source of the actual labels. For the PCAP files collected by the router, the system performed flow segmentation based on a five-tuple consisting of source IP, destination IP, source port, destination port, and protocol type, and extracted network flow features such as TCP window information, TLS handshake packet information, packet length sequence, packet arrival time, flow length-related features, and flow duration. Subsequently, a tree model was used to evaluate the importance of the network flow features, resulting in 49 key features. These 49 key features were then input into a trained lightweight CNN classification model, which output the predicted labels for the corresponding network flows.

[0078] For the PCAP file captured by the mobile terminal, the system extracts the 5-tuple information and application name information, and uses the application name as the source of the true label. The system matches the predicted label output from the router with the true label obtained from the mobile terminal using the 5-tuple, forming a correspondence between the true label and the predicted label for the same network flow, and calculates evaluation metrics such as accuracy, recall, and F1 score accordingly. This method verifies the effectiveness of the lightweight CNN classification model in real-time identification of PCDN traffic at the router.

[0079] To further illustrate the technical effects of this embodiment, a comparison of the key performance indicators of the system before and after feature screening is shown in Table 1.

[0080] Table 1

[0081]

[0082] As shown in Table 1, after evaluating the importance of network flow features and selecting key features using a tree model, the number of input features was reduced from 72 to 49, and the input dimensionality was reduced by approximately 31.94%. While maintaining consistency in the model output category and evaluation method, feature selection reduces the number of input features that the inference module needs to process, which helps reduce computational overhead on edge devices such as routers and improves the feasibility of model deployment on resource-constrained devices. Furthermore, establishing the correspondence between true and predicted labels using quintuples enables accurate evaluation of the same network flow identification results, providing a basis for subsequent model optimization.

[0083] Example 2: This example proposes a lightweight traffic identification system for PCDN resource abuse, including:

[0084] The training module is configured to: acquire PCAP files of PCDN software traffic using a packet capture tool; segment and label the PCAP files based on quintuples and application names; extract predefined network flow features from the PCAP files; use a tree model to filter key features; and build and train a lightweight CNN classification model.

[0085] The deployment module, configured to be integrated into the router, includes: an inference module, which contains C language code derived from a trained CNN classification model for real-time classification of network traffic; and a feature processing module, which consists of embedded feature extraction tools for traffic capture, flow segmentation, and flow feature extraction.

[0086] The testing and evaluation module is configured to test the deployed model through both dataset replay and real-world environments, evaluate model performance using preset performance metrics, and output traffic identification results.

[0087] In the training module, there are 72 predefined network flow features. The tree model is trained using these 72 features, and the 49 features with the highest scores are selected as key features based on their importance scores.

[0088] Specifically, in the training module, Wireshark was first used to capture network traffic data from the PCDN software via router port mirroring at multiple different time periods, and the data was saved as PCAP files. Next, a feature extraction tool was run to extract 72 key network traffic features from these PCAP files. Then, a tree model was trained using these features, and the 49 most representative features with the highest scores were selected based on the model's evaluation of feature importance. Finally, a multi-layer convolutional neural network and fully connected layers were used to further extract features, and classification was performed based on these features to achieve accurate traffic identification.

[0089] The router deployment module comprises two key components: a feature processing module and an inference module. The feature processing module's primary task is to capture network traffic data passing through the router in real time and segment this data into easily processed data streams for subsequent feature extraction. Based on this, the module calculates a series of predefined network traffic features, laying the foundation for further analysis. The inference module converts the Keras model file, trained and validated using a CNN, into C++ code. This conversion allows the model to be embedded into the router's firmware, enabling the router to directly run the model. Once embedded in the router, the model can perform real-time inference on the feature data output by the feature processing module, effectively classifying and identifying network traffic, ensuring the router maintains uninterrupted network communication capabilities.

[0090] In the testing module, PCDN software traffic is first captured. The pcapdroid tool is used to extract quintuple information and output the ground truth labels. Then, a lightweight CNN model is used to output predicted labels. These predicted labels are compared and matched with the ground truth labels provided by the pcapdroid tool to evaluate model performance. After the matching process is complete, model evaluation is performed by calculating the accuracy of the model's predictions and using key performance indicators such as precision, recall, and F1 score to measure the model's effectiveness.

[0091] Example 3: This example proposes a computer-readable storage medium storing a computer program that, when executed by a processor, implements the method described in this invention.

[0092] Example 4: This example proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the computer program is executed, it implements the method described in this invention.

[0093] It should be noted that the processing flow of embodiments 2-4 corresponds to the specific steps of the method provided in embodiment 1 of the present invention, and has the corresponding functional modules and beneficial effects of the method. Technical details not described in detail in this embodiment can be found in the method provided in embodiment 1 of the present invention.

[0094] The program code used to implement the methods of this application may be written in any combination of one or more programming languages. This program code may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that when executed by the processor or controller, the functions / operations specified in the flowcharts and / or block diagrams are implemented. The program code may be executed entirely on a machine, partially on a machine, as a standalone software package partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0095] The specific implementation schemes described above further illustrate the purpose, technical solution, and beneficial effects of the present invention. It should be understood that the above descriptions are merely specific implementation schemes of the present invention and are not intended to limit the scope of the present invention. Any equivalent changes and modifications made by those skilled in the art without departing from the concept and principles of the present invention should fall within the scope of protection of the present invention.

Claims

1. A lightweight traffic identification method for PCDN resource abuse, characterized in that, Includes the following steps: S1. Capture PCDN software traffic using mobile terminal packet capture tools and PC-side network protocol analysis tools and export it as a PCAP file; Extract the quintuple and application name information from the PCAP file. After data cleaning, divide the PCAP file according to the correspondence between the quintuple and the application name. S2. Design predefined network flow characteristics; For the PCAP files of each PCDN software, the network flow features are extracted using a feature extraction tool; Use a tree model for feature selection to filter out key features; S3. Construct a lightweight CNN classification model, which consists of three convolutional modules and three fully connected modules. Each convolutional module contains a convolutional layer, a ReLU activation function, and a Dropout regularization layer. S4. Randomly divide the dataset into a training set and a test set according to a preset ratio; use the training set to train the CNN classification model for a preset number of rounds; After training, the model performance is evaluated using a test set; S5. Convert the trained CNN model into C language code, solidify it as an inference module, and embed it into the router's firmware; embed the feature extraction tool into the router as a feature processing module; S6. Test the deployed model using both dataset replay and real-world environment methods. Evaluate model performance using preset performance metrics, optimize the model based on the evaluation results, and output lightweight traffic identification results.

2. The method according to claim 1, characterized in that, Step S1 specifically includes the following steps: S101. At different times, the PCDN software is run on the mobile terminal to generate traffic. At the same time, the mobile terminal packet capture tool is opened to capture the traffic and export it as a PCAP file. On the PC, the network protocol analysis tool is used to capture the traffic packets mirrored on the router's LAN port and export them as a PCAP file. S102. Extract the five-tuple information from the PCAP file exported by the mobile terminal packet capture tool. The five-tuple includes source IP, destination IP, source port, destination port, and protocol type. Clean the data and remove incomplete, damaged, or incorrectly formatted data packets. S103. Based on the correspondence between the quintuple and the application name, divide the PC-exported image PCAP files. Each individual PCAP file corresponds to the traffic of a PCDN software and is named after that PCDN software.

3. The method according to claim 2, characterized in that, Step S102 extracts the quintuple information from the PCAP file exported by the mobile terminal packet capture tool, specifically including the following steps: S1021. Traverse the PCAP file exported by the mobile terminal packet capture tool, use the dpkt library to parse the data packets in the file, extract the five-tuple information and extract the application name information from the end of the data packet; S1022. Obtain the last 32 bytes from the original payload of the data packet, extract the bytes corresponding to the application name from bytes 8 to 28, decode them according to UTF-8 encoding and remove null characters and spaces at both ends to obtain the application name; S1023. Store the quintuple and the corresponding application name in a list as the basis for dividing the mirror PCAP file.

4. The method according to claim 1, characterized in that, Step S6 specifically includes the following steps: S601. In the dataset replay scenario, traffic replay is performed using pre-collected labeled PCAP files. The classification results are obtained through the feature processing module and the inference module, and the model performance is evaluated using accuracy, recall, and F1 score. S602. In real-world scenarios, real traffic is generated by mobile terminals and classified in real time by routers. The real labels are compared with the predicted labels to evaluate the model performance. S603. Based on the evaluation results, identify the strengths and weaknesses of the model performance, propose optimization suggestions, and repeat steps S1 to S6 to adjust and retrain the model.

5. The method according to claim 4, characterized in that, Step S602 specifically includes the following steps: S6021. Configure a packet capture tool on the mobile terminal to capture PCDN traffic as the real label source, and at the same time open a port on the PC to receive the classification results output by the router via UDP protocol. S6022. Parse the PCAP file captured by the packet capture tool to obtain a list containing quintuples and real labels; S6023. Parse the classification results received by the PC and obtain a list containing quintuples and predicted labels; S6024. Traverse the two lists and perform stream matching using quintuples to obtain the correspondence between each predicted label and the true label. S6025. Based on the matching results, evaluate the model performance using accuracy, recall, and F1 score.

6. The method according to claim 1, characterized in that, In step S4, the training set accounts for 90% of the dataset, and the test set accounts for 10%; the CNN classification model is trained for 100 rounds.

7. A lightweight traffic identification system for PCDN resource abuse, characterized in that, include: The training module is configured to: acquire PCAP files of PCDN software traffic using a packet capture tool, and divide and label the PCAP files based on the quintuple and application name; Extract predefined network flow features from the PCAP file and use a tree model to filter key features; Build and train a lightweight CNN classification model; The deployment module, configured to be integrated into the router, includes: an inference module, which contains C language code derived from a trained CNN classification model for real-time classification of network traffic; and a feature processing module, which consists of embedded feature extraction tools for traffic capture, flow segmentation, and flow feature extraction. The testing and evaluation module is configured to test the deployed model through both dataset replay and real-world environments, evaluate model performance using preset performance metrics, and output traffic identification results.

8. The system according to claim 7, characterized in that, In the training module, there are 72 predefined network flow features. The tree model is trained using these 72 features, and the 49 features with the highest scores are selected as key features based on their importance scores.

9. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the program is executed by the processor, it implements the method as described in any one of claims 1-6.

10. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that, When the computer program is executed, it implements the method as described in any one of claims 1-6.