Data packet frequency analysis-based network proxy encrypted traffic feature extraction method
A technology of frequency analysis and traffic characteristics, applied in data exchange networks, digital transmission systems, instruments, etc., it can solve the problems of fine-grained identification of traffic without Shadowsocks, and the inability to classify traffic at fine-grained level, so as to improve the classification effect and high accuracy. Effect
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0068] This embodiment is based on the complete Shadowsocks encrypted proxy traffic feature extraction simulation carried out based on steps 1 to 4 of the present invention. The overall flow chart is as follows figure 1 As shown in Fig. 1, the network traffic characteristics generated by the combined action of highly discriminative packet extraction technology and clustering results are used for encryption proxy traffic classification.
[0069] Firstly, the data packets with high discrimination are extracted. The specific process is as follows: figure 2 shown. Assume that a captured data stream is expressed as F=(p 1 ,...,p n ), where p i represents the i-th packet. packet p i The information contained includes three parts of data packet direction, data packet size and data packet flag information, if the data packet p i If it is a SYN data packet with a length of 54 sent from the client to the server, the data packet is encoded as U_54_SYN, which represents a SYN packe...
Embodiment 2
[0078] In this embodiment, the method of the present invention is compared with other traffic classification algorithms to verify the advantages and effectiveness of the present invention. The network traffic constructed by combining the traffic feature extraction method (TF-IDF) based on word frequency analysis of the present invention with the traditional machine learning algorithm nearest neighbor algorithm (k-NN), support vector machine (SVM), and random forest (RANF) The classifiers outperform the result of classifying without using these classifiers directly. Using the same traffic data set to classify web traffic, the comparison results of different methods are shown in Table 2:
[0079] Table 2 Comparison of classification accuracy of different methods
[0080] classification algorithm
k-NN
k-NN_T
SVM
SVM_T
RANF
RANF_T
Accuracy
67.51%
72.85%
63.62%
72.81%
71.04%
76.16%
[0081] It can be seen from Table 2...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


