Reverse index mixed compression and decompression method based on Hbase database

An inverted index and mixed compression technology, which is applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of unseen application precedent, low decompression efficiency, and low compression ratio, so as to improve decompression efficiency and improve Compression ratio, the effect of improving space utilization

Inactive Publication Date: 2012-10-03
CHENGDU UNIV OF INFORMATION TECH
View PDF3 Cites 42 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The above compression methods improve the compression ratio or decompression efficiency from different angles and methods, but the common problem in inverted index compression is that these compression methods either focus on the compression ratio or decompression efficiency, For an inverted index that pays more attention to query efficiency, file reading and data decompression are a unified process, and these two factors should be considered together, rather than being biased towards one side alone
[0008] Hbase database currently only supports Gzip compression and Lzo compression. Gzip compression satisfies a high compression ratio, but the decompression efficiency is low; Lzo has high decompression efficiency, but the compression ratio is low
[0009] To sum up, there is currently no compression and decompression method based on the Hbase database, which can take into account the high decompression rate and high compression ratio, and realize the unified consideration of file reading and data decompression. There are no related reports in the public literature. There is no application precedent in practical application

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reverse index mixed compression and decompression method based on Hbase database
  • Reverse index mixed compression and decompression method based on Hbase database
  • Reverse index mixed compression and decompression method based on Hbase database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The present invention will be further described in detail below in conjunction with the drawings and specific embodiments:

[0043] First, the Hbase database is explained: Hbase is a distributed column database built on Hadoop, and its file system follows Hadoop's HDFS (Hadoop File System). A table of Hbase can have multiple column families (Column Family). In physical storage, one column family corresponds to a folder, and the folder can contain several Hfile files. Such as figure 1 with figure 2 As shown, the Hfile file is a variable-length file. Each record in the Data block stores a key / value pair, and the rest mainly stores the corresponding positioning information and index information. Hbase will automatically sort the data in memory by row key based on lexicographical order. When writing a file, all data in the memory area will be written to the file by block at one time, and a data block will be added to the end of the file. Index information. When reading a fi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a reverse index mixed compression method based on an Hbase database. The reverse index mixed compression method comprises the steps of processing the Hbase database to obtain an Hbase database reversed index data table including keys and values; compressing the key part by a key dictionary compression method; compressing the value part by a variable bytecode compression method; and writing the compressed content into files. The invention also discloses a decompression method of the compressed file key part after being compressed by the compression method. The decompression method comprises the steps of judging the length of each compressed data item, processing to obtain decompressed data according to two conditions of the length being less than or equal to 13, and the length being more than or equal to 25, otherwise, failing to decompress. According to the method adopts the classification mixed compression and the classification decompression method, the compression ratio is improved on the condition that the high decompression ratio is ensured possibly; the unified considerations of file reading and data decompression can be achieved; and the query efficiency of the reverse index can be improved completely and the storage space can be saved.

Description

Technical field [0001] The invention relates to an inverted index compression and decompression method, in particular to a hybrid compression and decompression method of an inverted index based on an Hbase database. Background technique [0002] Compressing data can speed up the rate at which data is read from disk to memory. The efficient decompression algorithm makes the compressed data block read from the hard disk to the memory and then decompress the time consumed less than the same uncompressed data block read from the disk to the memory. When the amount of data is huge, this time reduction is more obvious. Therefore, in more cases, the search engine system that uses the compressed inverted index table is more efficient than the unused system. Generally speaking, the higher the compression ratio, the lower the decompression efficiency. If you blindly pursue one of compression efficiency, decompression efficiency, and compression ratio, none of them can meet actual needs....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 安俊秀程芃森
Owner CHENGDU UNIV OF INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products