A real-time summary generating method for streaming data

A streaming data and summary technology, applied in the field of data stream identification, can solve problems such as data defects, out-of-order and overlapping, and achieve the effect of reducing memory usage, reducing memory, and reducing hash collision rate

Active Publication Date: 2017-02-15
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] The traditional fuzzy hash algorithm (Identifying almost identical files using contexttriggered piecewise hashing, 2006) is suitable for offline data processing, but it cannot be applied to streaming data being transmitted due to data defects, disorder and overlap

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A real-time summary generating method for streaming data
  • A real-time summary generating method for streaming data
  • A real-time summary generating method for streaming data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] In the SFH algorithm, an input data block such as figure 1 Shown may be split into 3 parts:

[0049] 1) Fragmentation is the data between two reset points, and the fuzzy hash value of the fragmentation can be directly calculated;

[0050] 2) The left truncated data is the data from the beginning of the data block to the first reset point. The fuzzy hash value cannot be directly calculated. It is necessary to keep the first w-1 bytes of data in the buffer. You can Calculate the matrix product of the remaining bytes (leftstate);

[0051]3) The right truncated data is the data from the last reset point in the data block to the end position of the data. It is a part of the shard that has not yet fully arrived. The fuzzy hash value cannot be directly calculated, but part of the data can be calculated. matrix product (mapping hash state);

[0052] The complete processing flow mainly includes the following basic operations (multiplication and product below refer to matrix m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a real-time summary generating method for streaming data. Streaming data summary calculation is performed by employing the method of storing intermediate calculation results, and the phenomena of data shortage and damage, disorder and overlapping can be handled; the matrix multiplication is used as the strong Hash algorithm to reduce memory occupation, so that the method can calculate streaming data summaries in real time with less memory.

Description

technical field [0001] The invention relates to the field of data stream identification, and is a stream data-oriented real-time summary generation method. Background technique [0002] As network traffic continues to increase, identifying data in the traffic is becoming more and more important for data leakage prevention (Data Leakage Protection, DLP), security defense and other requirements. For example, Trojan horses, viruses, pornographic videos, and internal files can be identified from network traffic. If they can be identified during transmission, they can be audited and disposed of at an early stage. [0003] To identify data in traffic, a general approach is to generate summaries of data in the network. However, there are a large number of complex application protocol processing situations in network traffic, such as online video playback, network disk file download, etc., and this kind of data has a lot of out-of-order and incomplete capture. [0004] The current...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06N7/02
CPCG06F16/24568G06N7/02H04L9/0643H04L2209/08H04L63/1408H04L63/145G06F17/16H04L9/3236
Inventor 郑超李响刘庆云李舒杨威张成伟汤琦
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products