Distributed query method and system for complex task of querying massive structured data

A distributed query and structured data technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems that HIVE has no index, low query efficiency, and cannot directly use stream record data query, etc.

Active Publication Date: 2012-06-27
国信电子票据平台信息服务有限公司
View PDF6 Cites 60 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As a result, HIVE's real-time query efficiency is low and the delay is large, and the purpose of online data loading and fast query cannot be realized, and the query application that does not require a large number of result sets cannot be satisfied.
[0006] (2) There is no index in HIVE, and all its query operations are performed by reading raw data files
Therefore, the query efficiency is low
[0007] (3) The query process of HIVE is that the user describes the query rules through the HQL language (a query expression similar to the SQL language). MapReduce-oriented task decomposition method (that is, multiple disk write and read operations are required when executing query tasks), so its

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed query method and system for complex task of querying massive structured data
  • Distributed query method and system for complex task of querying massive structured data
  • Distributed query method and system for complex task of querying massive structured data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In the distributed storage method and query method of massive structured data of the present invention, the adopted data structure includes two basic parts: full sorting index and record data. The fully sorted index is to sort all the attribute values ​​of the record according to the dictionary order. Record data is to store each record sequentially in units of rows. The fully sorted index supports filtering query conditions, such as the query conditions in WHERE.

[0022] Before describing the present invention in detail, firstly, the related concept "batch query" involved in the present invention is defined. Batch query refers to query tasks with a large number of query result sets, according to user needs to select a single query to obtain a small number of result sets or multiple queries to obtain all result sets.

[0023] The present invention will be described below in conjunction with the accompanying drawings and specific embodiments.

[0024] In the query me...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a distributed query method and a distributed query system for a complex task of querying massive structured data. The distributed query method for the massive structured data comprises the following steps of: receiving a query task from a user, and decomposing the query task into a plurality of query subtasks; and concurrently querying the data which is stored in a distributed way in batches according to each of the plurality of query subtasks, and returning queried result sets in the distributed way. According to the query method, batch query is adopted, and an intermediate result state is kept, so that the requirements of interface display application for the quick query of small data volumes are fully taken into account, and simultaneously, the counting requirement of a big result set under a counting and analysis background is combined.

Description

technical field [0001] The present invention relates to a massive data management system and method in the field of information security, and more specifically relates to a complex query task-oriented query and distributed data management method and system, mainly used for landing storage and analysis of network messages in the field of information security And statistics, analysis and other applications of massive log data. Background technique [0002] In the field of contemporary information security, data management is no longer limited to simple data processing methods such as traditional data sampling and analysis, but uses efficient data storage systems to store data on the ground and support complex data statistics and analysis after the event. [0003] Since the currently widely used relational databases are subject to consistency constraints, the query methods and query systems based on relational databases have low loading efficiency and slow retrieval speed under...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 吴广君李超王树鹏云晓春王勇李斌斌
Owner 国信电子票据平台信息服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products