Binary unknown protocol message format division method based on sequence alignment

A technology of sequence comparison and protocol message, applied in digital transmission systems, electrical components, transmission systems, etc., can solve problems such as high time complexity, inapplicability of binary protocols, lack of format division basis and methods, etc., to reduce time Effects of Complexity, Accurate and Efficient Format Inference

Active Publication Date: 2018-10-26
SOUTHEAST UNIV
View PDF6 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing private protocol packet format inference schemes have some defects
The PI project uses the unsupervised UPGMA clustering method for hierarchical clustering, which has high time complexity
Although Discoverer reduces the time complexity by constructing message attribute sequences, its processing method of dividing message samples by common text class delimiters is not suitable for binary protocols
[0004] The existing binary protocol division meth

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Binary unknown protocol message format division method based on sequence alignment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0031] Such as figure 1 As shown, a binary unknown protocol message format division method based on sequence comparison in this embodiment, for the network traffic set G to be formatted (set G is the protocol data that has been preprocessed and only contains a single protocol set), the element g in G is a character sequence in an unknown protocol format, and the length of the longest character sequence is Len.

[0032] The processing steps are as follows:

[0033] 1. Initialize the comparison result record sequence Seq[n], where n=1, 2, . . . , 2*Len, indicating a certain position of the protocol sequence.

[0034] After initialization, the initial value of each position of the Seq[n] sequence is 0. 2*Len is the longest value of the sequence Seq recorded in the comparison result, and the length of Seq in actual processing may be less than 2*Len. In the division of the protocol format, it is necessary to consider mining and retaining the diversity features in the sequence dat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a binary unknown protocol message format division method based on sequence alignment. The method comprises the following main steps: obtaining a protocol sequence set of a single type through preprocessing; setting a result sequence; performing global sequence alignment and local sequence alignment for protocol sequences in pairs; combining global sequence alignment results; recording local sequence alignment results as similarity; integrating the alignment results into the result sequence; and according to the result sequence, performing message format division, and the like. Compared with a scheme using methods, such as hierarchical clustering, the method provided by the invention has lower algorithm time complexity, also can effectively ease a problem of slidingof field positions caused by insertion of too much spaces during sequence alignment in the existing scheme, and has better accuracy and practicability.

Description

technical field [0001] The invention belongs to the technical field of network protocol analysis, and in particular relates to a method for dividing the binary unknown protocol message format based on sequence comparison. Background technique [0002] In 1967, R.A.Scantleburry and K.A.Bartlett of England's National Physical Laboratory first used the English word "protocol" to describe the process of data communication in a memo. Nowadays, various standardization organizations, network communication technology solution providers, Network operators have formulated corresponding public agreements. As the name implies, the specifications of this type of protocol are public, and the data format used is also in the known category, such as the hypertext transfer protocol most commonly used when mobile apps interact with the background, and the dynamic host configuration protocol used when configuring addresses in home routers. At the same time, for the purpose of commercial intere...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): H04L29/06H04L12/26
CPCH04L43/18H04L69/03H04L69/06
Inventor 秦中元陆凯
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products