Column partition based numerical data compression method for column storage database

A numerical data and compression method technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as lightweight compression algorithm compression

Inactive Publication Date: 2012-06-27
WUHAN DAMENG DATABASE
View PDF3 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when there is no obvious distribution law, it is impossible to directly use lightweight ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Column partition based numerical data compression method for column storage database
  • Column partition based numerical data compression method for column storage database
  • Column partition based numerical data compression method for column storage database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0020] Define the basic table Table1 (col1int) in the database. Now there are a batch of numerical data 143, 768, 4, 20000, 453, 1081143, 1048581 that need to be loaded into the database. Obviously, the distribution characteristics of these data are not obvious, so it is difficult to directly compress them with lightweight compression algorithms. Now decompose these data into two sub-columns according to the high and low bytes. After the high 16 bits and low 16 bits, it is found that the high 16 bits of the first 5 data are all 0, and the high 16 bits of the last two data are all 10H. Therefore, the local characteristics of the data can be used to further compress the data.

[0021] The three eigenvalues ​​of the decomposed data distribution are counted separately: the number B of different values, the number of times CB of numerical changes, and the maximum value MD of the difference between each data and the minimum value. The statistical results are shown in Table 1.

[0...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a numerical data compression method for a column storage database, which includes: extracting one column of data from a data table, dividing each column of data into multiple subsidiary columns, counting characteristic values of data distribution laws in each subsidiary column, and compressing by selecting the corresponding lightweight compression algorithm according to the characteristic values of the data distribution laws, wherein different data in each subsidiary column occupies identical space. The numerical data compression method is simple in operation and applicable to compression of data of regular distribution and irregular distribution.

Description

technical field [0001] The invention belongs to the technical field of databases in computers, and in particular relates to a method for compressing numerical data of a column storage database based on column decomposition. Background technique [0002] With the wide application of database technology, query-intensive applications such as online analytical processing, data warehouse and data mining pose higher challenges to the query performance of database management systems. Data compression technology is one of the methods to improve the query performance of database management system. The purpose of data compression is to reduce I / O, thereby improving query efficiency. Generally, the characteristics of data distribution rules are used to achieve the purpose of reducing storage space. However, during the query process, the data needs to be decompressed. When the time spent on decompression exceeds the time saved by data lookup and transmission overhead brought by compr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 郭琰章涛吴恒山
Owner WUHAN DAMENG DATABASE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products