Method and system for locality non-cluster index based on streaming data

A streaming data, locality technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of high index table overhead, decreased reading performance, loss of spatial locality, etc., to increase throughput , the effect of reducing space and reducing stress

Active Publication Date: 2016-02-17
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF5 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] 1. It takes a lot of space to store the index table, and it is very expensive to build the index table in real time for streaming data
[0009] 2. Sequential scanning of an index table will lead to random access to the data table, loss of spatial locality, and degradation of read performance
[0010] 3. Does not take advantage of the temporal locality of streaming data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for locality non-cluster index based on streaming data
  • Method and system for locality non-cluster index based on streaming data
  • Method and system for locality non-cluster index based on streaming data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The technical solutions of the present invention are described in detail below in conjunction with the examples, which are not considered as limitations on the present invention.

[0049] In the research and development of the streaming data storage system, the inventor found that the streaming data has the characteristic of time locality. The so-called temporal locality means that streaming data is correlated within a period of time, that is, when publishing data, multiple pieces of data are proposed sequentially for the same topic, and it is precisely because of the correlation in content that within a period of time For the data appearing in , the specific value of the index key covers a relatively small range, but not all the values ​​involved in the index key, that is, the data base of the items to be indexed is very small.

[0050] However, the establishment and query of existing data indexes do not take advantage of this temporal locality, and the index structure...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for locality non-cluster index based on streaming data. The method comprises: a real-time update step: updating index records generated aimed at each received piece of streaming data in a Hash index table in real time, the index record recording an index key in the streaming data, a major key corresponding to the index key when the index key appears for the first time, and the number of data covered from first appearance to last appearance at present; a writing-in step: when a trigger condition is achieved, writing the index records in the Hash index table in an index table, and continuing to execute the real-time update step. The method greatly reduces space of the index table and bandwidth cost generated by establishing the index table. The index method combines random access and sequential scanning, effectively uses time locality characteristic of the streaming data, and is more aligned with an access model of a storage medium, and improves inquiry efficiency of index data.

Description

technical field [0001] The invention relates to the field of large-scale data processing, in particular to a localized non-clustered index method and system based on streaming data. Background technique [0002] The current streaming index technology mainly establishes a corresponding index for each piece of data in real time through the indexing method of the traditional database. [0003] Such as figure 1 Shown is a schematic diagram of an index establishment method in the prior art. [0004] figure 1 The left table in the middle is the streaming data continuously received by the streaming data processing system, which is displayed in the form of a data table. The primary key in the table is used to identify each piece of data in a time-increasing manner, and the column to be indexed is the The data items that can be indexed, each piece of data can also have other data items such as data columns. [0005] In the existing technology, in order to facilitate retrieval and...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2255G06F16/24568
Inventor 郑天祺程学旗张敬亮黄淳
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products