Hadoop platform and distributed processing programming model-based TCP (transport control protocol) dataflow reassembly method

A technology of distributed processing and programming model, which is applied in the direction of error prevention/detection using return channel, digital transmission system, electrical components, etc., to achieve the effect of reducing overhead and improving operating efficiency

Active Publication Date: 2014-12-31
CHONGQING UNIV OF POSTS & TELECOMM
View PDF5 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, there is still a lack of algorithms for TCP stream reassembly on the Hadoop platform

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop platform and distributed processing programming model-based TCP (transport control protocol) dataflow reassembly method
  • Hadoop platform and distributed processing programming model-based TCP (transport control protocol) dataflow reassembly method
  • Hadoop platform and distributed processing programming model-based TCP (transport control protocol) dataflow reassembly method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0025] A non-limiting embodiment is given below in conjunction with the accompanying drawings to further illustrate the present invention. It should be understood, however, that these descriptions are exemplary only, and are not intended to limit the scope of the invention. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concept of the present invention.

[0026] Such as figure 1 As shown, the present invention needs a MapReduce task, massive data is all stored in HDFS with the form of block (default 64MB), revises InputFormat and completes the mapping of fragmentation to key-value pair, and the input key-value pair of Map is , the output key-value pair is . The output of the Map goes through the Shuffle intermediate process to complete the partitioning, sorting, and merging of the output key-value pairs. Gather the "timestamp+serial number+packet payload" of the same five-tuple in the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Hadoop platform and distributed processing programming model-based TCP (transport control protocol) dataflow reassembly method. Entry key value pairs of Map are <offsets, binary data packets>, and output key value pairs are <quintuples, timestamps+serial numbers+data packet effective net loads>. '+' operation shows that the timestamps, the serial numbers and the data packet effective net loads are partially pieced into a large byte array, and timestamps+serial numbers+data packet effective net loads are finally saved as the BytesWritable data type of Hadoop; a partitioning process, a sorting process, a merging process and the like on the output key value pairs are finished through a Shuffle intermediate process during the output of Map; the 'timestamps+serial numbers+data packet effective net loads' of the identical quintuples in Map output are gathered together to form key value pairs <quintuples, list(timestamps+serial numbers+data packet effective net loads)> to serve as the input of Reduce; the final output key value pairs of Reduce are < quintuples, reassembled data>. The Hadoop platform and distributed processing programming model-based TCP dataflow reassembly method disclosed by the invention has the advantages that the operating efficiency is improved, and the spending is reduced.

Description

technical field [0001] The invention relates to the field of network big data flow analysis. Specifically, it is a TCP stream recombination method based on Hadoop platform and distributed processing programming model. Background technique [0002] TCP is a connection-oriented and reliable transport layer protocol, and has been widely used in the Internet and networks that require high transmission reliability. Due to the complex layers of the Internet protocol stack and the limited length of a single data packet, the application layer data is likely to be divided into multiple fragments, and multiple data packets are responsible for transmission. Therefore, before analyzing the data of the application layer, reorganizing the TCP session is a necessary prerequisite. [0003] The traditional TCP reassembly technology uses data structures such as linked lists and hash tables, combined with TCP quintuples, confirmation numbers, serial numbers, and various identification bits (...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08H04L12/861H04L1/16
Inventor 雒江涛高伟杨军超王小平邓生雄申健刘勇
Owner CHONGQING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products