A HBase secondary index system and method based on Elastcisearch

A secondary index and indexing technology, applied in the field of indexing systems, can solve problems such as low query efficiency, inability to realize complex business demand retrieval and real-time retrieval, etc.

Active Publication Date: 2019-02-01
THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
View PDF5 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Purpose of the invention: In order to overcome the deficiencies of the prior art, the present invention provides a HBase secondary index system and meth...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A HBase secondary index system and method based on Elastcisearch
  • A HBase secondary index system and method based on Elastcisearch
  • A HBase secondary index system and method based on Elastcisearch

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Such as figure 1As mentioned above, the present invention is based on big data open source software, and for unstructured data, based on efficient indexing technology of column data, designs a secondary index method based on distributed column database, and utilizes secondary index technology as the column value of distributed column database Establish an index system, and realize the efficient, automatic and safe creation of index tables through coprocessor technology, breaking the defect of HBase itself lacking column family indexes. The system of the invention supports the dynamic increase or decrease of the index, reduces a large amount of network overhead caused by random query, and improves the scalability and practicability of the secondary index. In addition, because the existing Hadoop-based SQL query mainly uses Hive, the SQL operation is converted into a MapReduce task, and the efficiency is low. The present invention aims at the problem of weak SQL operation...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a HBase secondary index system based on Elastcisearch. The system comprises a data table design module, a key-value matching module, a row key return module, a multi-table joint query module, a standard SQL query module and an interface module. It includes designing secondary index based on distributed column database, establishing index system for column value of distributed column database by secondary index technology, and creating index table automatically and efficiently by co-processor technology, which can break the defect of HBase lacking column family index. Supports the dynamic increase or decrease of index, reduces the network overhead caused by random query, and improves the scalability and practicability of secondary index. Aiming at the problem of weakSQL operation ability of distributed data storage, SQL parser and executor are designed and parallel SQL query engine is constructed. SQL operation is transformed into Region scan operation by usingdistributed column database API, coprocessor and filter, and efficient parallel SQL query is realized by making full use of two-level index mechanism.

Description

technical field [0001] The present invention relates to an index system and method, in particular to an Elastcisearch-based HBase secondary index system and method. Background technique [0002] With the explosive growth of data volume, the scale of the file system is also expanding. The number of files in the system reaches tens of millions or even hundreds of millions. Both file system administrators and users need to locate the required files through file metadata. How to organize and Indexing massive amounts of metadata in distributed file systems is an urgent problem to be solved. [0003] HBase is NoSQL running on Hadoop. It is a distributed and scalable big data warehouse. It can integrate key / value storage mode for real-time query, and offline or batch processing through Mapreduce. However, with the application driver on the HBase system, it is found that the Global-Rowkey-Index no longer meets the application requirements. The single way of retrieving data through...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/22G06F16/242
Inventor 徐琳王犇贺成龙吴蔚
Owner THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products