Unlock instant, AI-driven research and patent intelligence for your innovation.

Data compression method based on bigtable distributed storage system

A distributed storage and data compression technology, applied in the field of data processing, can solve the problem of unpredictable SSTable file size, and achieve the effect of reducing memory usage and speeding up reading speed.

Active Publication Date: 2016-10-12
XIDIAN UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to propose a data compression method based on the Bigtable distributed storage system, to solve the problem that the size of the SSTable file generated by Bigtable is unpredictable in the face of high concurrent read and write operations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data compression method based on bigtable distributed storage system
  • Data compression method based on bigtable distributed storage system
  • Data compression method based on bigtable distributed storage system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The invention will be further described in detail below in conjunction with the drawings.

[0027] Reference image 3 , The data compression of the present invention includes the following steps:

[0028] Step 1. The Bigtable distributed storage system sets a threshold for the number of SSTable files at each level according to its actual operation.

[0029] Step 2. Check whether the number of SSTable files in the Lth layer in the Bigtable distributed storage system exceeds the number threshold of the layer, if it exceeds the number threshold, perform step 3, otherwise, continue to check.

[0030] Step 3. Select the SSTable file to be compressed from the L-th layer by taking turns.

[0031] Depending on the level L of the SSTable file, there are two situations as follows:

[0032] When L> When 0, select an SSTable file arbitrarily from the Lth layer;

[0033] This is because in L> In the 0 level, each SSTable file is arranged in the lexicographic order of the keywords, and the keywo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Data compression method based on a Bigtable distributed storage system, which is mainly used for solving the problem of unpredictable sizes of the SSTable files generated by the prior art. The Data compression method based on the Bigtable distributed storage system comprises the following realization steps of: 1) setting a quantity threshold value for the SSTable file of each hierarchy by a system according to a running condition; 2) detecting whether the quantity of the SSTable files on the Lth hierarchy in the system exceeds the threshold value or not, if so, executing step 3, or else, continuing to detect; 3) selecting the SSTable files to be compressed by turns from the Lth hierarchy; 4) finding all files superposed with the selected SSTable files on the Lth hierarchy in a key value interval from the L+1th hierarchy, and merging and compressing the selected SSTable files in the two hierarchies. According to the data compression method based on the Bigtable distributed storage system disclosed by the invention, the hierarchical structure of the SSTable files is adequately utilized and a reading speed for data is increased; the data compression method based on the Bigtable distributed storage system can be used for merging and compressing the data in the distributed storage system.

Description

Technical field [0001] The invention belongs to the technical field of data processing, and particularly relates to a data compression method, which can be used for storage and management similar to a Bigtable distributed storage system. Background technique [0002] Bigtable is a distributed data storage system designed by Google. It is a non-relational database used to process massive amounts of data and can be reliably deployed on thousands of servers. Bigtable uses the SSTable format to store data internally, and the persistent state information of the sub-tables in Bigtable is stored on the Google file system GFS. The read and write operation process of data in Bigtable is as follows figure 1 As shown, when the write operation reaches the sub-table server, the transaction information is first recorded in the log, and the record is inserted into the Memtable ordered memory buffer after success. Due to limited memory space, when the Memtable size reaches the threshold, it wil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): H04L29/06H04L29/08
Inventor 樊凯史晓丽谈苗苗李晖
Owner XIDIAN UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More