[0062] Embodiment 2 (BitTorrent protocol behavior characteristics):
[0063] First use the track HTTP protocol that interacts with the tracker server:
[0064] 1) The client sends an HTTP GET request to the tracker
[0065] The feature of this step is: GET/announce...the GET request sent to Tracker by HTTP/1.0, including the keyword Bittorent:
[0066] 2) The tracker returns the information of the downloader of the same file to the other party. The feature of this step is: the Peers address and port of the dictionary list encoded by bencoded.
[0067] 3) The BitTorrent client sends a connection request according to the obtained peer list. The feature of this step is that the "BitTorrent" keyword is included in the connection request of each peer.
[0068] Protocol feature extraction: The feature extraction is mainly divided into two steps, the first is the static feature extraction of protocol packets. This part mainly relies on a single data packet to make preliminary judgments on the protocol, including text command format protocol; fixed header format protocol and no fixed format protocol. In this step, extract as many feature fields as possible in the protocol data packet to narrow the scope of behavior feature matching. Next is the extraction of protocol operating behavior characteristics. This part is for a single data packet that cannot effectively identify information such as protocol type or version. It is necessary to monitor the actual operation process and extract to further accurately determine the specific protocol type and version number used. feature. The matching of behavior characteristics is aimed at the detailed behaviors and actions of the protocol running in a stage, so the accuracy is higher.
[0069] The protocol behavior feature rule set is related to specific protocol types and versions. The purpose of establishing protocol behavior feature rule sets for various types of agreements is mainly as follows:
[0070] 1) The correctness of the static protocol rule matching result can be verified through the protocol behavior feature rule set, that is, the possible protocol type or software usage set uniquely identifies the specific identification result after the static protocol rule matches.
[0071] 2) Based on the protocol type judged after the static protocol rule is matched, the specific protocol running version and other details can be identified to ensure the correctness of the subsequent protocol analysis results.
[0072] 3) For the matching of protocol behavior characteristics, you can in-depth inspection or audit of specific protocol or software operation events and actions. Only the messages after static rule matching and behavior characteristic matching can accurately locate the specific information of the protocol or software used in the communication .
[0073] The protocol behavior feature rule set established for a certain type of protocol is a rule set, and the control flow graph (CFG) model is used to describe the protocol behavior feature rule set. As shown in Figure 3, in the CFG model representation method, each step of the protocol operation behavior feature is represented by an ellipse node. Here, except for the two special rules TRUE and FALSE used to return the protocol matching results, the other verification rules are all a Boolean Logic, its execution result can only be true or false. This protocol verification rule set is executed from the root node. If the execution result of the current protocol verification rule is true, the verification rule tree on the left is executed, if it is false, the verification rule tree on the right is executed until the execution reaches TRUE or FALSE node. Figure 3 is an example of the behavior feature rule set of the BitTorrent protocol: defines the behavior feature rule set of the BitTorrent protocol. The execution of the protocol behavior matching rule set starts from the root node, and an IP message only passes the match of the behavior feature sequence. May return BitTorrent protocol ID, otherwise return FALSE. The size of the behavior feature sequence established for a certain protocol feature model directly affects the accuracy and efficiency of the protocol recognition result: when there are more entries for the static feature and behavior feature sequence of a certain type of protocol, the accuracy of the protocol recognition result is reduced. The higher the value, the lower the efficiency of protocol identification; when there are fewer entries for a certain type of protocol static feature and behavior feature sequence, the protocol identification efficiency will be high, but the accuracy of the protocol identification result may be reduced. Therefore, it should be Define the protocol validation rule set reasonably as needed.
[0074] The intelligent analysis and correction stage of the C protocol is shown in Figure 2:
[0075] For the determined protocol type, use the corresponding analysis method to analyze. If there is an error in the analysis format result, use the intelligent analysis correction method to try the analysis until a more accurate analysis result is obtained. In the actual network communication environment, especially in the use of certain proprietary protocols, the upgrade or change of the software version usually brings changes in the analysis format and method. In this case, it is unrealistic to hope to establish a uniform and applicable analysis format and method. Even if the use type of the protocol and the related version information are determined in the previous modules, it is actually for the currently existing software version. For many software, version upgrades are carried out very frequently. Therefore, the parsing speed of the existing version often cannot keep up with the update speed of the software. In this case, if a comprehensive analysis is required for each new or unknown version, the workload is very large and there is a lot of repetitive work. In fact, the structural change of the protocol used for this kind of change is very small, and the intelligent analysis and correction method is used in this device to unnecessary duplication of work.
[0076] In the actual analysis process, the main changes to the agreement include the following aspects:
[0077] 1. Change of field size
[0078] 2. Change of field offset
[0079] 3. Changes in the order of fields
[0080] The purpose of the protocol intelligent analysis modification attempt module is mainly to automatically analyze and realize the changes made to the data packet format part of the protocol used by some proprietary software in the version changes or certain specific behaviors. In the case of parsing errors caused by similar problems, the workload of re-analysis is greatly reduced, so that the understanding of protocol relevance provides greater accuracy and flexibility for specific analysis in the case of determining the protocol type.
[0081] The intelligent analysis used when a certain protocol cannot be accurately analyzed. The selection of the trial range when correcting the trial will affect the accuracy and efficiency of protocol analysis: the more the trial range is selected, the software or protocol that can be correctly resolved will be covered The more types and versions there are, the efficiency will decrease. When the range of attempts is less, the accuracy of the in-depth analysis results for a specific type or version will be poor, but the efficiency is higher at this time. It is recommended that users formulate an appropriate range of corrections based on their understanding of the specific analytical protocol and possible changes.
[0082] This device uses algorithm:
[0083] 1. Fast matching of protocol static feature rules;
[0084] After the static feature rules of various types of protocols are defined in the protocol sample extraction stage, the multi-mode matching algorithm is used to match the static feature rules, which is used to discover and quickly match the static feature of the IP message application data in the protocol identification stage , So as to find the set of possible protocol types to which the IP message belongs. The multi-pattern matching algorithm can be used to perform the fast matching process of the static characteristics of the protocol: the IP packet application layer payload data is used as the Text of the multi-pattern matching algorithm, and all the extracted static characteristic sets of the protocol are used as the pattern set, and the multi-pattern matching algorithm is used to find Collect all possible protocol types, and then call the protocol behavior feature matching module to eliminate the wrong protocol type until a suitable protocol type is found.
[0085] 2. Establishment and matching of protocol behavior characteristics rules;
[0086] In the process of extracting protocol behavior feature rules, data mining is performed on a large number of collected protocol samples, and association rules and self-learning methods are used to gradually extract and modify behavior feature sequences. For the sake of efficiency, the size of the protocol behavior feature sequence generated by different protocol operation processes is different. The length of the behavior feature sequence can be determined according to the specific accuracy requirements. If necessary, multiple behavior feature sequence matching can be realized for different behaviors of a specific protocol. . Among the protocol sets output by the protocol static feature matching, the multi-pattern matching algorithm is used to match all the protocol behavior feature sequence sets until detailed information such as the specific protocol type and version is determined.
[0087] 3. Intelligent protocol analysis and correction algorithm;
[0088] After the protocol type is determined through protocol static feature matching and behavior feature matching, if a data packet that cannot be parsed correctly is encountered, the intelligent protocol analysis and correction module will be called to make corrections. Here we mainly adopt the method of cyclic traversal verification, and verify the possible conditions one by one according to the change of field size, field offset change and field coding sequence until a more detailed protocol analysis result is obtained. Due to the work of loop traversal verification, this part of the module has a more obvious impact on efficiency, and it is necessary to appropriately set the correction range.
[0089] An intelligent protocol analysis device, as shown in Figure 4: includes a protocol static rule library, a protocol behavior feature model library, a protocol static rule matching engine, a protocol behavior feature matching engine, and an automated adjustment analysis attempt module; the protocol static rule library It is connected with the protocol static matching engine; the protocol behavior characteristic model library is connected with the protocol behavior characteristic matching engine; the protocol behavior characteristic matching engine is connected with the protocol analysis engine; the protocol analysis engine is connected with the intelligent analysis correction attempt module.
[0090] Among them, the protocol static rule library and the protocol behavior feature model library respectively store the static matching rules established in the protocol feature model stage and the behavior feature sequence extracted according to the actual running process of the protocol or software. The protocol static rule matching engine implements fast matching algorithms for all data field features that can be matched in a single data packet. The protocol behavior feature matching engine needs to record a series of actions and states during the protocol operation to match the established behavior feature sequence.