Unlock instant, AI-driven research and patent intelligence for your innovation.

Column storage compression method based on HBase

A compression method and column storage technology, which is applied in the field of big data, can solve problems such as large data dispersion, small classification granularity, and difficulty in guaranteeing compression efficiency, and achieve the effect of strengthening data compactness, reducing differences, and avoiding hot spots

Inactive Publication Date: 2018-07-24
CHINESE PEOPLE'S PUBLIC SECURITY UNIVERSITY
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Therefore, the existing column storage database compression strategy encounters problems such as large data dispersion, small classification granularity, high computing cost caused by defects in supporting classification algorithms, and difficult to guarantee compression efficiency during the compression process. This method proposes a sorting-based The column area hybrid compression strategy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Column storage compression method based on HBase
  • Column storage compression method based on HBase
  • Column storage compression method based on HBase

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] A HBase-based column storage compression method in this embodiment, a Hybrid Compression Strategy of Column-Based Compression and Sector-Based Compression based on sorting, see figure 2 As shown, first read the data of each column from HBase, sort the data of each column, and then store the sorted data; calculate the similarity factor S between each district by counting the statistics of random blocks, and the similarity factor S is used to The defined amount of judging the similarity of the interval is obtained by the absolute difference of the statistic T feature components of the two areas, and the required feature components are given by the compression strategy of the first area. If the similarity of the characteristics of each area in the selected column is high, it is judged that the distribution of the column is balanced, and the data in the column is suitable for the mixed-level column compression strategy; Compression strategy for mixed-level areas. Then ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a column storage compression method based on an HBase. The method comprises the steps that each column of data is read from the HBase, each column of data is reordered, and thedata is saved in each zone; the statistical magnitude of random blocks is counted to calculate a similarity factor S between each zone, the similarity factor S is the definition quantity for judgmentof interval similarity and is obtained by the absolute difference of characteristic components of the statistical magnitude T of two zones, and whether the column distribution is uniform or discreteis judged; if the distribution is uniform, a mixed column compression mode is adopted; if the distribution is discrete, a mixed zone compression mode is adopted. The column storage compression methodgreatly reduces the calculation cost and improves the compression efficiency.

Description

technical field [0001] The invention relates to the technical field of big data, in particular to an HBase-based column storage compression method. Background technique [0002] Data compression has always been a key concern in the data field. There are many compression methods. Lightweight compression methods include run-length coding, dictionary coding, and null value suppression. Heavyweight compression coding includes GZIP, Lempel-Ziv series, Huffman coding and arithmetic coding. Wait. The difference between lightweight and heavyweight compression algorithms is that lightweight algorithms operate on continuous values, while heavyweight algorithms break the boundaries between values ​​and operate on values ​​as a series of bytes. Commonly used lightweight and heavyweight compression algorithms are classified as figure 1 shown. [0003] The research on the column storage compression algorithm strategy first started in the related research on C-store. J.Abadi et al. prop...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30H03M7/30
CPCG06F16/1744H03M7/3084
Inventor 芦天亮孙靖超杜彦辉蔡满春
Owner CHINESE PEOPLE'S PUBLIC SECURITY UNIVERSITY
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More