Method and system for carrying out multi-dimensional regional inquiry on distribution type sequence table

A distributed sequence table and multi-dimensional technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of slow query speed, long response time, real-time retrieval of massive data, large system load, etc., and achieve low storage capacity Overhead, High Reliability, Effects of Increased Speed

Active Publication Date: 2013-04-03
北京东方国信科技股份有限公司
View PDF4 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

When the amount of data is very large, the query speed of this method is slow, and the system load is heavy, and the response time is too long to meet the needs of current network applications for real-time retrieval of massive data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for carrying out multi-dimensional regional inquiry on distribution type sequence table
  • Method and system for carrying out multi-dimensional regional inquiry on distribution type sequence table
  • Method and system for carrying out multi-dimensional regional inquiry on distribution type sequence table

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] figure 1 It is a flowchart of a method for performing multi-dimensional interval query on a distributed sequence table described in Embodiment 1 of the present invention, as shown in figure 1 As shown, the specific methods of this embodiment include:

[0030] S101. Create a secondary index table for each index column;

[0031] The data stored in the distributed sequence table is divided horizontally into multiple fragments (Region) according to the primary key. Each fragment stores a piece of data sorted according to the primary key, and at the same time distributes the fragments to multiple fragmentation servers (RegionServer). Support high-speed point query and interval query by primary key, and support high-speed random data reading and writing. Each table can have one or more columns, and operations such as column-by-column projection are supported.

[0032] When data is queried, the query condition is often given by specifying one or more columns, which are ofte...

Embodiment 2

[0048]In order to further optimize the interval query of the distributed sequence table, when it is necessary to perform interval query on the index column, before step S102 of the first embodiment, it may further include a fragmentation information estimation optimization query step; in order to ensure the distributed sequence The data in the table and the index table can always be kept consistent, and when the distributed sequence table needs to be updated, the present invention can also include a consistent update step.

[0049] Shard information estimation optimization query steps:

[0050] When it is necessary to perform interval queries on index columns, the query plan tree is merged and deduplicated preprocessed, and then the query logic is converted into a disjunctive formula and then executed in parallel. During execution, the subquery with the smallest result set is selected from the conjunction sub-form Execute, while other subqueries filter the result set in the fo...

Embodiment 3

[0061] According to the same idea, the present invention also provides a system for performing multi-dimensional interval query on a distributed sequence table, image 3 It is a structural block diagram of the system for performing multi-dimensional interval query on the distributed sequence table described in this embodiment. Such as image 3 As shown, the system for performing multi-dimensional interval query on the distributed sequence table described in this embodiment includes:

[0062] The index table building module 301 is used to build an index table on the distributed sequence table, wherein: an index table is created for each index column of the distributed sequence table, and the index column value of the distributed sequence table is , the primary key value, and the length sequence of the index column value are spliced ​​together as the primary key of the index table, and the primary key is the secondary index of the distributed sequence table;

[0063] When data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for carrying out multi-dimensional regional inquiry on a distribution type sequence table. The method comprises the following steps of: respectively establishing a sheet of corresponding secondary index table for each index list of the distribution type sequence table, and the principal linkage of the corresponding secondary index table of each index list is a combination value of an index list value, the principal linkage of the distribution type sequence table, and the length of the index list value; and when a regional inquiry request is received, according to the field name of the inquiry request, searching a corresponding secondary index table of the field name from each secondary index table, according to the field value of the inquiry request, searching a record position corresponding to the field value of the inquiry request from a corresponding secondary index table, and directly reading corresponding data from the record position of the distribution type sequence table. According to the method and the system, a multi-dimensional regional inquiry speed can be greatly accelerated, and the requirements on high performance, low storage cost and high reliability can be met at the same time.

Description

technical field [0001] The invention relates to the technical field of distributed information processing, in particular to a method and system for performing multi-dimensional interval query on a distributed sequence table. Background technique [0002] Distributed Ordered Table (DOT for short) is a database system most suitable for multi-dimensional interval query under massive data (TB to PB level). When performing multi-dimensional interval query on a distributed sequential table, usually scan the entire table directly to filter out data that meets the conditions. When the amount of data is very large, the query speed of this method is slow, and the system load is heavy, and the response time is too long to meet the current demand for real-time retrieval of massive data by network applications. Contents of the invention [0003] The main purpose of the present invention is to construct an index based on a distributed sequence table so that it can meet the requirements...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 刘佳谷靖宇查礼
Owner 北京东方国信科技股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products