Message sequence clustering method of unknown binary private protocol

A private protocol and sequence clustering technology, applied in the information field, can solve problems such as modeling difficulties, inability to accurately measure message sequence similarity, large overlapping frequent items, etc., and achieve the effect of improving accuracy

Active Publication Date: 2019-06-28
南京赛宁信息技术有限公司
View PDF7 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In sequence clustering algorithms based on probabilistic models, modeling is often difficult, and it is only very effective in long sequence clustering calculations
Keyword-based sequence clustering algorithm, the more classic one is the Apriori algorithm. The problem with this algorithm is that there will be a large number of overlapping frequent items, which makes the dimension of the feature vector representing the message sequence very large.
Because this method ignores the keyword length of the protocol message sequence, and does not consider the semantic correlation features before and after the words when making the message embedding representation, it cannot accurately measure the similarity between the message sequences, and the clustering effect is poor.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Message sequence clustering method of unknown binary private protocol
  • Message sequence clustering method of unknown binary private protocol
  • Message sequence clustering method of unknown binary private protocol

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Attached below figure 1 The specific steps of the present invention are further described.

[0032] Step 1, using a data collection method to collect unknown binary private protocol message sequences.

[0033] (1a) Set the network card mode of the server acquisition device to a mixed mode, so that it can monitor wireless communication data, and then open both communication entity A and communication entity B to establish a communication connection;

[0034] (1b) Use wireshark software to intercept the message sequence communication data between communication entities A and B, and save it as a pcap format file to obtain an unknown binary private protocol message sequence, which includes link layer data and transport layer data and application layer data.

[0035] Step 2, preprocessing the collected unknown binary private protocol packet sequence.

[0036] (2a) Analyze the intercepted unknown binary private protocol message sequence according to the structure of the ne...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a message sequence clustering method of an unknown binary private protocol, and mainly solves the problem that similarity between protocol message sequences cannot be accurately measured in a protocol reverse process in the prior art. The implementation scheme comprises the following steps: 1) collecting an unknown binary private protocol message sequence; 2) preprocessingthe collected message sequence; 3) extracting multi-scale N-gram features of the preprocessed message sequence; 4) carrying out dimension reduction on the multi-scale N-gram features based on varianceselection; 5) according to the multi-scale N-gram features after dimension reduction, obtaining a multi-scale N-after dimension reduction; carrying out embedded representation on the message sequenceby the gram feature; 6) determining the optimal clustering number K according to the embedded representation of the message sequence, and 7) clustering the message sequence according to the optimal clustering number K. According to the method, the potential semantic information of the message sequence is fully mined, the similarity between the message sequences can be accurately measured, the clustering accuracy is improved, and the method can be used for clustering unknown binary private protocols.

Description

technical field [0001] The invention belongs to the field of information technology, and further provides a message sequence clustering method, which can be used for clustering unknown binary private protocols. Background technique [0002] The network protocol is a specification for entities in the network to communicate, and clearly stipulates the data format and related synchronization issues when communicating entities exchange information with each other. In addition to standardized communication protocols in the network, there are also a large number of unknown private protocols. Packet sequence clustering is the primary task in the protocol reverse process, that is, to separate the packets of each type of private protocol packet sequence according to the similarity between the message sequences to the greatest extent, and then perform field format inference and state machine inference . [0003] Packet sequence clustering of private protocols, that is, the core issu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/06G06K9/62
Inventor 杨超吴继超
Owner 南京赛宁信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products