Massive structured data storage and query methods and systems supporting high-speed loading

A technology of structured data and data, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problem that HIVE has no index, cannot meet the needs of query, cannot directly use stream record data query, etc. The effect of query efficiency

Active Publication Date: 2012-06-27
国信电子票据平台信息服务有限公司
View PDF6 Cites 71 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As a result, HIVE's real-time query efficiency is low and the delay is large, and the purpose of online data loading and fast query cannot be realized, and the query application that does not require a large number of result sets cannot be satisfied.
[0006] (2) There is no index in HIVE, and all its query operations are performed by reading raw data files
Therefore, the query efficiency is low
[0007] (3) The query process of HIVE is that the user describes the query rules through the HQL language (a query expression similar to the SQL language). MapReduce-oriented task decomposition method (that is, multiple disk write and read operations are required when executing query tasks), so ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Massive structured data storage and query methods and systems supporting high-speed loading
  • Massive structured data storage and query methods and systems supporting high-speed loading
  • Massive structured data storage and query methods and systems supporting high-speed loading

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In the distributed storage method and query method of massive structured data of the present invention, the adopted data structure includes two basic parts: full sorting index and record data. The fully sorted index is to sort all the attribute values ​​of the record according to the dictionary order. Record data is to store each record sequentially in units of rows. The fully sorted index supports filtering query conditions, such as the query conditions in WHERE.

[0025] Before describing the present invention in detail, firstly, the related concept "batch query" involved in the present invention is defined. Batch query refers to query tasks with a large number of query result sets, according to user needs to select a single query to obtain a small number of result sets or multiple queries to obtain all result sets.

[0026] The present invention will be described below in conjunction with the accompanying drawings and specific embodiments.

[0027] In the query me...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a massive structured data storage method, a massive structured data query method, a massive structured data storage system and a massive structured data query system, which all support high-speed loading. The distributed storage method for massive structured data comprises the following steps of: receiving the data which is loaded at high speed from a user; and caching the loaded data in a distributed way by utilizing a double-sliding window structure, and storing the cached data in the distributed way after a fixed period. By the distributed storage method for the massive structured data, newly-loaded data can be cached, so that the query efficiency of application such as streaming data which frequently uses recently-loaded data in post data query can be improved.

Description

technical field [0001] The present invention relates to a massive data management system and method in the field of information security, and more specifically relates to a complex query task-oriented query and distributed data management method and system, mainly used for landing storage and analysis of network messages in the field of information security And statistics, analysis and other applications of massive log data. Background technique [0002] In the field of contemporary information security, data management is no longer limited to simple data processing methods such as traditional data sampling and analysis, but uses efficient data storage systems to store data on the ground and support complex data statistics and analysis after the event. [0003] Since the currently widely used relational databases are subject to consistency constraints, the query methods and query systems based on relational databases have low loading efficiency and slow retrieval speed under...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 吴广君李超王树鹏云晓春王勇李斌斌
Owner 国信电子票据平台信息服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products