HBase-based incremental index creation and retrieval method

An incremental indexing and indexing technology, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve the problems of indexing and retrieval of data that cannot be continuously increased, and achieve capacity improvement, usability, and speed. Effect

Active Publication Date: 2013-11-13
XI AN JIAOTONG UNIV
View PDF5 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0011] The purpose of the present invention is to solve the problem that existing methods cannot quickly and effectively construct indexes and retrieve continuously increasing data. Acc

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HBase-based incremental index creation and retrieval method
  • HBase-based incremental index creation and retrieval method
  • HBase-based incremental index creation and retrieval method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0043] The specific content of the method for constructing and retrieving an incremental index based on HBase of the present invention will be described in detail below with reference to the accompanying drawings.

[0044] like figure 1 As shown, the present invention utilizes HBase to store the index, which is composed of an index system and a retrieval system.

[0045] A. Use HBase to store indexes

[0046] When designing the storage structure of the index, using the dynamic and scalable characteristics of HBase's data columns, the word element is used as the keyword primary key, and the text primary key is used as the column field name of the index storage table. With the increase of the number of indexed texts, the number of column fields also increases dynamically; using the distributed storage characteristics of HBase, the storage of large-scale text and index information is realized, and the indexing system can provide multiple different data sources at the same time. ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an HBase-based incremental index creation and retrieval method. The method includes the following steps: a storage structure for indexes is designed on the basis of the column storage mechanism of HBase, and three data tables are used for respectively storing original texts, index information and statistic information; a Web-oriented to-be-indexed text acquisition interface is designed, and HTTP protocol-based text indexing service is provided; incremental indexes are created for continuously increasing texts, an indexing system does not recreate indexes for all the data when the new to-be-indexed texts are generated and arrive, the indexes of the new to-be-indexed texts are appended into the existing indexes, all the text contents and the index information thereof are first put into the buffer before the indexes are stored, and the data are written in batches when the data volume of the buffer reaches a threshold value; a retrieval service interface for a variety of format results is provided, users can use retrieval service via the Web-oriented interface, and the indexing system can perform retrieval according to the search requests submitted by users and format retrieval results according to the requirements of users.

Description

technical field [0001] The present invention relates to a method for constructing an incremental index and an incremental index retrieval result format for continuously increasing text on the basis of HBase (a distributed, column-oriented open source database that supports the storage of millions of columns and hundreds of millions of rows of data) The method of optimization mainly solves the problems of low efficiency in constructing indexes for continuously increasing text content and difficulties in cooperation between indexing and retrieval systems and other information systems. Background technique [0002] With the development of Internet technology, the amount of text stored in the information system is increasing day by day. In order to find the required information from it, it is necessary to build an index. When the amount of data to be indexed is extremely large, the storage capacity provided by a single computer cannot meet the storage requirements of the index. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 郑庆华董博贺欢宋凯磊徐海鹏马天陈亚兴
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products