Method for protocol obfuscation based on unified description syntax

By employing a protocol obfuscation method based on a unified description syntax, and utilizing state prediction decision trees and configuration files to obfuscate traffic on local and remote nodes, the problem of insufficient adaptability of monitoring strategies in existing technologies is solved, achieving effective protection of user privacy and countermeasures against traffic analysis.

CN117834289BActive Publication Date: 2026-06-12NANJING UNIV OF SCI & TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
NANJING UNIV OF SCI & TECH
Filing Date
2024-01-08
Publication Date
2026-06-12

Smart Images

  • Figure CN117834289B_ABST
    Figure CN117834289B_ABST
Patent Text Reader

Abstract

The application discloses a protocol confusion method based on a unified description grammar, which comprises the following steps: for target confusion traffic, extracting the structure, message format, field and state machine and other characteristics of the protocol; creating a message definition and an event processing program according to the statistical characteristics, combining a self-defined protocol language and a standard library to create a protocol confusion configuration file; training a state machine prediction to build a decision tree through real target confusion protocol data; and finally, shaping the original traffic based on the simulated protocol confusion method of the message format and the state machine. Through the analysis of the protocol message format and the state machine, and the application of the programmable protocol specification, the original protocol traffic of the web traffic can be well deformed into the target confusion traffic, so that the original protocol traffic cannot be identified, which has important significance for protecting the user privacy and security and resisting traffic analysis attacks.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention belongs to the field of network security technology, specifically a protocol obfuscation method based on a unified description syntax. Background Technology

[0002] With the rapid development of the Internet, people's dependence on the Internet is constantly increasing. The Internet has become an important part of modern society, but the problems of network security and privacy leakage are becoming increasingly prominent.

[0003] Although users can protect their data security to a certain extent through encryption and other means, the metadata of the traffic, such as packet length, remains unchanged. Attackers can still analyze traffic information using traffic inspection methods to obtain information about the user's network behavior. To protect user privacy and security, traffic obfuscation technology has emerged.

[0004] As traffic obfuscation technology has evolved, existing techniques lack adaptability to rapidly changing monitoring strategies, and there are few protocol obfuscation methods based on a unified description syntax. Therefore, it is necessary to design a reasonable, effective, and quickly configurable traffic obfuscation method. Summary of the Invention

[0005] The purpose of this invention is to address the problems existing in the prior art by providing a protocol obfuscation method based on a unified description syntax. This method extracts the message format and state machine of web page traffic, writes a protocol obfuscation configuration file based on these characteristics, and trains a state prediction decision tree to determine the obfuscation strategy for traffic obfuscation.

[0006] The technical solution to achieve the purpose of this invention is: a protocol obfuscation method based on a unified description syntax, comprising the following steps:

[0007] Step 1: Based on the 5-tuple information, the input target traffic sample is split into session traffic, and preprocessed by marking, grouping, and numbering according to the traffic type; the 5-tuple is the source address, destination address, source port, destination port, and protocol 5-tuple;

[0008] Step 2: Extract the message format and state machine of each complete stream one by one;

[0009] Step 3: Convert the state machine into a two-dimensional grayscale image and use it as input to the state prediction decision tree for state prediction training;

[0010] Step 4: Using the traffic characteristics from Step 2, generate the protocol obfuscation configuration file required for traffic obfuscation based on the protocol obfuscation method using the unified description syntax.

[0011] Step 5: Use the state prediction decision tree from Step 3 and the protocol obfuscation configuration file from Step 4 to obfuscate the original webpage on the local morph node. Obfuscation methods include packet merging, constructing new packets, data encryption, constructing new data according to the message format, and padding, thereby generating obfuscated traffic.

[0012] Step 6: If the obfuscated traffic passes through the monitoring network and reaches the remote recovery node, proceed to Step 7. If the obfuscated traffic cannot pass through the monitoring network, change the target webpage traffic sample and start executing from Step 1 again.

[0013] Step 7: On a remote restoration node outside the monitoring network domain, restore the original web page traffic to the obfuscated traffic through the restoration event in the protocol obfuscation configuration file, and access the original web page server.

[0014] Furthermore, in step 1, while processing the input traffic by splitting it, the IP, port, and MAC bytes are all set to 0.

[0015] Furthermore, in step 1, before numbering, data packets with a payload length of 0 need to be removed.

[0016] Furthermore, the numbering mentioned in step 1 follows the PSH flag bit, and a fully loaded packet needs to be combined with the first non-fully loaded packet with the PSH flag bit to form a fragment.

[0017] Further, step 2 extracts the header protocol field, field separator or marker, fixed-length field, variable-length field, data type, encoding method and status code of each complete stream data packet.

[0018] Furthermore, step 3 trains the state prediction decision tree model based on the two-dimensional grayscale image generated in step 2, and generates a reasonable next state machine output through the state machine input.

[0019] Furthermore, in step 4, the target traffic characteristics are utilized, and a custom protocol language and standard library are combined to create message definitions and event handlers to implement the protocol obfuscation configuration file.

[0020] Furthermore, in step 5, the local obfuscation node simulates protocol obfuscation through message format simulation, employing three obfuscation methods: encryption, reconstruction, and padding. Specifically, these include:

[0021] The original data is encrypted using the encryption method and key preset in the protocol obfuscation configuration file.

[0022] The original data is reconstructed based on the message format preset in the protocol obfuscation configuration file;

[0023] The original data header or insufficient data is filled in using the pre-defined fill fields in the protocol obfuscation configuration file.

[0024] Furthermore, in step 5, the local obfuscation node simulates protocol obfuscation through a state machine, employing two obfuscation methods: packet merging and new packet construction. Specifically, this includes:

[0025] Network protocol state machines can be divided into control state machines and data transmission state machines. Control state machines include handshake state machines, error handling state machines, etc.

[0026] When the original protocol's data transmission state machine simulates the target obfuscation protocol's control state machine, the control state machine of the target obfuscation protocol is simulated by constructing a new packet to build the control state machine.

[0027] When the control class state machine of the original protocol simulates the transmission data class state machine of the target obfuscation protocol, it is simulated by merging with the next transmission data class state machine of the original protocol.

[0028] When the state machine type of the original protocol is consistent with the state machine type of the target obfuscated protocol, the normal simulated state machine is replaced by data.

[0029] Furthermore, in step 6, when the obfuscated traffic is intercepted by the monitored network, a timeout event is triggered to replace the protocol obfuscation configuration file. When all configuration files become invalid, the target traffic needs to be replaced and the process needs to start from step 1 again.

[0030] Furthermore, step 7 involves restoring the obfuscated traffic, including obtaining the payload length by decrypting the encryption method and key preset in the protocol obfuscation configuration file, then decrypting and removing padding, ultimately restoring the original web page transmission and enabling access to the original web page server.

[0031] Compared with the prior art, the significant advantages of this invention are:

[0032] 1) By deeply exploring the characteristics of web page traffic protocols, attributes such as header protocol fields, field separators or tags, fixed-length fields, variable-length fields, data types, encoding methods, and status codes are transformed into grayscale images. Combined with a decision tree model, the state machine is used to predict and deduce the target traffic, accurately depicting the target traffic.

[0033] 2) By leveraging target traffic characteristics and combining custom protocol languages ​​and standard libraries to create message definitions and event handlers, a protocol obfuscation configuration file is implemented to determine the traffic obfuscation strategy. This allows for only minor modifications to be made to address rapidly changing monitoring strategies, thus improving obfuscation adaptability.

[0034] 3) It has good portability and deployment flexibility, and can be deployed as a plugin on local agent nodes and remote agent nodes, playing the roles of local transformation node and remote recovery node respectively.

[0035] 4) Lower obfuscation cost: When facing a monitoring network with simple protocol analysis, only message format simulation is performed. If the simulation fails, it is determined that there is a state machine identification and matching in the protocol analysis of the monitoring network. In addition to the message format simulation, state machine simulation is added.

[0036] This invention, through the analysis of protocol message formats and state machines, combined with the application of programmable protocol specifications, can effectively transform web page traffic into target obfuscated traffic, making its original protocol traffic unrecognizable. This is of great significance for protecting user privacy and security and combating traffic analysis attacks. Attached Figure Description

[0037] Figure 1 This is a schematic diagram of the protocol obfuscation method based on the unified description syntax of the present invention. Detailed Implementation

[0038] To make the objectives, technical solutions, and advantages of this application clearer, the following detailed description is provided in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the scope of this application.

[0039] Combination Figure 1 This invention proposes a protocol obfuscation method based on a unified description syntax, comprising:

[0040] The input sample is the target webpage traffic, taking Google webpage traffic as an example.

[0041] Feature extraction: First, the process is split based on the five-tuple and the protocol. Then, features such as header protocol fields, field separators or tags, fixed-length fields, variable-length fields, data types, encoding methods, and status codes are extracted.

[0042] State prediction decision tree model generation: The state machine is predicted and derived by combining the characteristics of target webpage traffic with a decision tree model.

[0043] Traffic obfuscation: Utilizing the characteristics of target webpage traffic, a protocol obfuscation method based on a unified description syntax is used to simulate message format and state machine of the original webpage traffic data packets, thereby generating obfuscated traffic.

[0044] System Deployment: Deploy the web traffic obfuscation method as a plugin on local and remote proxy nodes, making them act as local deformation nodes and remote recovery nodes, respectively.

[0045] Specifically, in one embodiment, a protocol obfuscation method based on a unified description syntax is provided, the method comprising the following steps:

[0046] Step 1: Based on the 5-tuple information, the input target traffic sample is split into session traffic, and preprocessed by marking, grouping, and numbering according to the traffic type; the 5-tuple is the source address, destination address, source port, destination port, and protocol 5-tuple;

[0047] Here, while the input traffic samples are split, the MAC, IP, and port bytes are all set to 0.

[0048] Before numbering, packets with a payload length of 0 need to be removed.

[0049] Here, the numbering follows the PSH flag bit. A full packet needs to be combined with the first non-full packet with the PSH flag bit to form a fragment.

[0050] Step 2: Extract the message format and state machine of each complete stream one by one;

[0051] Here, the header protocol fields, field separators or markers, fixed-length fields, variable-length fields, data types, encoding methods, and status codes of each complete stream data packet are extracted one by one.

[0052] Step 3: Convert the state machine into a two-dimensional grayscale image and use it as input to the state prediction decision tree for state prediction training;

[0053] Here, the state prediction decision tree model is trained using the two-dimensional grayscale image generated in step 2, and a reasonable next state machine output is generated through the state machine input.

[0054] Step 4: Using the traffic characteristics from Step 2, generate the protocol obfuscation configuration file required for traffic obfuscation based on the protocol obfuscation method using the unified description syntax.

[0055] This approach leverages target traffic characteristics, combined with a custom protocol language and standard library, to create message definitions and event handlers to implement a protocol obfuscation configuration file.

[0056] Step 5: Using the state prediction decision tree from Step 3 and the protocol obfuscation configuration file from Step 4, the original webpage is obfuscated at the local morph node. Obfuscation methods include packet merging, constructing new packets, data encryption, constructing new data according to the message format, and padding, thereby generating obfuscated traffic.

[0057] This process encrypts the original data based on the encryption method and key preset in the protocol obfuscation configuration file; reconstructs the original data based on the message format preset in the protocol obfuscation configuration file; and fills in the header or insufficient data areas of the original data based on the padding fields preset in the protocol obfuscation configuration file. The state machine of a network protocol can be divided into control state machines and transmission data state machines. Control state machines include handshake state machines, error handling state machines, etc. When the transmission data state machine of the original protocol simulates the control state machine of the target obfuscation protocol, it is simulated by constructing a new packet to build the control state machine of the target obfuscation protocol. When the control state machine of the original protocol simulates the transmission data state machine of the target obfuscation protocol, it is simulated by merging with the next transmission data state machine of the original protocol. When the state machine type of the original protocol is consistent with the state machine type of the target obfuscation protocol, the normal simulated state machine is replaced with data.

[0058] Step 6: If the obfuscated traffic passes through the monitoring network and reaches the remote recovery node, proceed to Step 7. If the obfuscated traffic cannot pass through the monitoring network, change the target webpage traffic sample and start executing from Step 1 again.

[0059] Here, the fact that the obfuscated traffic cannot pass through the monitoring network means that it is blocked by the monitoring network, thus preventing it from passing through.

[0060] Step 7: On a remote restoration node outside the monitoring network domain, restore the original web page traffic to the obfuscated traffic through the restoration event in the protocol obfuscation configuration file, and access the original web page server.

[0061] Here, the operation to restore the obfuscated traffic includes obtaining the payload length by decrypting the encryption method and key preset in the protocol obfuscation configuration file, then decrypting and removing padding.

[0062] This invention delves into the message format and state machine of web page traffic, trains a state prediction decision tree using real data, and determines the obfuscation strategy by writing a protocol obfuscation configuration file for the message format of the target obfuscation protocol. This effectively obfuscates web page traffic and is of great significance for protecting user privacy and security and combating traffic analysis attacks.

[0063] The foregoing has shown and described the basic principles, main features, and advantages of the present invention. Those skilled in the art should understand that the present invention is not limited to the above embodiments. The embodiments and descriptions in the specification are merely illustrative of the principles of the invention. Various changes and modifications can be made to the invention without departing from its spirit and scope, and all such changes and modifications fall within the scope of the present invention as claimed. The scope of protection of the present invention is defined by the appended claims and their equivalents.

Claims

1. A protocol obfuscation method based on a unified description syntax, characterized in that, Includes the following steps: Step 1: Based on the 5-tuple information, the input target traffic sample is split into session traffic, and preprocessed by marking, grouping, and numbering according to the traffic type; The quintuple consists of the source address, destination address, source port, destination port, and protocol quintuple. Step 2: Extract the message format and state machine of each complete stream one by one; Step 3: Convert the state machine into a two-dimensional grayscale image and use it as input to the state prediction decision tree for state prediction training; Step 4: Using the traffic characteristics from Step 2, generate the protocol obfuscation configuration file required for traffic obfuscation based on the protocol obfuscation method using a unified description syntax. Step 5: Using the state prediction decision tree from Step 3 and the protocol obfuscation configuration file from Step 4, the original webpage is obfuscated on the local morph node. Obfuscation methods include packet merging, constructing new packets, data encryption, constructing new data according to the message format, and padding, thereby generating obfuscated traffic. Local obfuscation nodes simulate protocol obfuscation through message format simulation, employing three obfuscation methods: encryption, reconstruction, and padding. Specifically, these include: The original data is encrypted using the encryption method and key preset in the protocol obfuscation configuration file. The original data is reconstructed based on the message format preset in the protocol obfuscation configuration file; The original data header or insufficient data is filled with the pre-defined fill fields in the protocol obfuscation configuration file. Local obfuscation nodes simulate protocol obfuscation through state machines, employing two obfuscation methods: packet merging and new packet construction. Specifically, these methods include: Network protocol state machines are divided into control state machines and data transmission state machines. Control state machines include handshake state machines and error handling state machines. When the original protocol's data transmission state machine simulates the target obfuscation protocol's control state machine, the control state machine of the target obfuscation protocol is simulated by constructing a new packet to build the control state machine. When the control class state machine of the original protocol simulates the transmission data class state machine of the target obfuscation protocol, it is simulated by merging with the next transmission data class state machine of the original protocol. When the state machine type of the original protocol is consistent with the state machine type of the target obfuscation protocol, the normal simulated state machine is replaced by data. Step 6: If the obfuscated traffic passes through the monitoring network and reaches the remote restoration node, proceed to step 7; if the obfuscated traffic cannot pass through the monitoring network, change the target webpage traffic sample and start from step 1 again. Step 7: On a remote restoration node outside the monitoring network domain, restore the original web page traffic to the obfuscated traffic through the restoration event in the protocol obfuscation configuration file, and access the original web page server.

2. The protocol obfuscation method based on a unified description syntax according to claim 1, characterized in that, In step 1, while splitting the input traffic, the IP, port, and MAC bytes are all set to 0.

3. The protocol obfuscation method based on a unified description syntax according to claim 1, characterized in that, Before numbering in step 1, data packets with a payload length of 0 need to be removed; The numbering follows the PSH flag bit. A full packet needs to be combined with the first non-full packet with the PSH flag bit to form a fragment.

4. The protocol obfuscation method based on a unified description syntax according to claim 1, characterized in that, Step 2: Extract the header protocol field, field separator or marker, fixed length field, variable length field, data type, encoding method and status code of each complete stream data packet.

5. The protocol obfuscation method based on a unified description syntax according to claim 1, characterized in that, Step 3: Train the GAN model based on the two-dimensional grayscale image generated in Step 2 to generate target traffic features for webpage traffic obfuscation.

6. The protocol obfuscation method based on a unified description syntax according to claim 1, characterized in that, In step 4, the target traffic message format is used, combined with a custom protocol language and standard library to create message definitions and event handlers to implement the protocol obfuscation configuration file.

7. The protocol obfuscation method based on a unified description syntax according to claim 1, characterized in that, When the obfuscated traffic described in step 6 is intercepted by the monitored network, a timeout event is triggered to replace the protocol obfuscation configuration file. When all configuration files become invalid, the target traffic needs to be replaced and the process needs to start from step 1 again.

8. The protocol obfuscation method based on a unified description syntax according to claim 1, characterized in that, Step 7 involves restoring the obfuscated traffic, including obtaining the payload length by decrypting the encryption method and key preset in the protocol obfuscation configuration file, then decrypting and removing padding, ultimately restoring the original web page transmission and enabling access to the original web page server.