Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

HBase-based incremental index creation and retrieval method

An incremental indexing and indexing technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of indexing and retrieval of data that cannot be continuously increased, and achieve capacity improvement, usability, and speed. Effect

Active Publication Date: 2013-11-13
XI AN JIAOTONG UNIV
View PDF5 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The purpose of the present invention is to solve the problem that existing methods cannot quickly and effectively construct indexes and retrieve continuously increasing data. According to the column storage mechanism of HBase and the format characteristics of indexes, a method of constructing and retrieving incremental indexes based on HBase is proposed. method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HBase-based incremental index creation and retrieval method
  • HBase-based incremental index creation and retrieval method
  • HBase-based incremental index creation and retrieval method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The specific content of the method for constructing and retrieving incremental indexes based on HBase of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0044] Such as figure 1 Shown, the present invention utilizes HBase to store index, is made up of index system and retrieval system.

[0045] A. Use HBase to store indexes

[0046] When designing the storage structure of the index, the data column of HBase is used to be dynamically expandable, and the token is used as the primary key of the keyword, and the primary key of the text is used as the column field name of the index storage table. As the number of indexed texts increases, the number of column fields also increases dynamically; using the characteristics of HBase's distributed storage, the storage of large-scale text and index information can be realized, and the index system can provide multiple different data sources at the same time. Indexing service:...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an HBase-based incremental index creation and retrieval method. The method includes the following steps: a storage structure for indexes is designed on the basis of the column storage mechanism of HBase, and three data tables are used for respectively storing original texts, index information and statistic information; a Web-oriented to-be-indexed text acquisition interface is designed, and HTTP protocol-based text indexing service is provided; incremental indexes are created for continuously increasing texts, an indexing system does not recreate indexes for all the data when the new to-be-indexed texts are generated and arrive, the indexes of the new to-be-indexed texts are appended into the existing indexes, all the text contents and the index information thereof are first put into the buffer before the indexes are stored, and the data are written in batches when the data volume of the buffer reaches a threshold value; a retrieval service interface for a variety of format results is provided, users can use retrieval service via the Web-oriented interface, and the indexing system can perform retrieval according to the search requests submitted by users and format retrieval results according to the requirements of users.

Description

technical field [0001] The present invention relates to a method for constructing an incremental index and an incremental index retrieval result format for continuously increasing text on the basis of HBase (a distributed, column-oriented open source database that supports the storage of millions of columns and hundreds of millions of rows of data) The method of optimization mainly solves the problems of low efficiency in constructing indexes for continuously increasing text content and difficulties in cooperation between indexing and retrieval systems and other information systems. Background technique [0002] With the development of Internet technology, the amount of text stored in the information system is increasing day by day. In order to find the required information from it, it is necessary to build an index. When the amount of data to be indexed is extremely large, the storage capacity provided by a single computer cannot meet the storage requirements of the index. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 郑庆华董博贺欢宋凯磊徐海鹏马天陈亚兴
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products