A method and device for storing massive network flow data

A network stream and data technology, applied in the field of mass stream data storage and query, can solve the problems of data expansion, storage overhead growth, and unsatisfactory loading speed, etc., to achieve the effect of ensuring query performance and reducing loading and storage overhead

Active Publication Date: 2017-07-11
INST OF INFORMATION ENG CHINESE ACAD OF SCI
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First of all, there is the problem of data expansion. Since Hbase uses column storage, when the original data is loaded into Hbase, a lot of information about columns and column clusters will be added, which will cause serious data expansion and directly cause the storage cost to increase exponentially.
Secondly, the single-point loading capability of Hbase is generally at the millisecond level for a single record. Since the NetFlow flow data arrives very fast, the existing loading speed of Hbase cannot meet the requirements of the actual environment at all.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for storing massive network flow data
  • A method and device for storing massive network flow data
  • A method and device for storing massive network flow data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] A storage method for massive network flow data, comprising the following steps:

[0047] Step 1: collect data query requests submitted by users within any period of time, and obtain query conditions according to the data query requests, and the data query requests are used to query the data to be queried;

[0048] A query request is a query statement submitted by a user within a period of time in a real environment;

[0049] Sql statements are all similar to this statement: Select a from table1where a=2; the following where statement is the query condition;

[0050] The extraction of the Where statement is a direct hard parsing, directly get the query statement, analyze each word, and directly intercept the following conditions when the where is reached;

[0051] Step 2: Analyze the time attribute and feature attribute in the query condition, count the time span of the time attribute and the frequency of occurrence of each feature attribute, and select the feature attr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a storage method and device of mass network flow data. The storage method includes the steps of collecting data query requests submitted by the user in an optional period and achieving query conditions according to the data query requests; analyzing time attributes and characteristic attributes of the query conditions and selecting a characteristic attribute of a threshold value exceeding preset occurrence frequency as a cluster attribute; determining number and end point of segments of to-be-queried data, determining size of a cache region according to memory space of the to-be-queried data in the segments and generating configuration files according to the cluster attribute, the number and the end point of the segments of the to-be-queried data and size of the to-be-written cache region; subjecting a collector to receiving the network flow data and transmitting the same to a file server, and subjecting the file server to storing the received network flow data according to the configuration files. According to the arrangement, the query conditions are directly reflected to corresponding space divisions to directly achieve writing or querying of the data, costs on uploading and storing are maximally decreased, and query performance is guaranteed.

Description

technical field [0001] The invention relates to the field of massive flow data storage and query, in particular to a method and device for storing massive network flow data. Background technique [0002] NetFlow is a network protocol released by Cisco in 1996 to collect and monitor network flow data. Because it can provide some key services for applications, including network data collection, network traffic statistics, denial of service monitoring, intrusion detection, etc., it has high application value and practical significance. [0003] Relational databases have been widely used as a traditional solution for NetFlow flow data management. Thanks to the mature index and query mechanism, the database has obvious advantages in data query processing. However, with the continuous expansion of data scale, database solutions encounter serious challenges in terms of scalability and data storage. First of all, NetFlow flow data arrives at a fast speed, and the loading speed of...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/183G06F16/22G06F16/24568
Inventor 陈重韬王伟平孟丹胡斌崔甲
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products