Column-storage oriented area-level data compression method

A technology of data compression and compression method, which is applied in the fields of electrical digital data processing, special data processing applications, instruments, etc., and can solve the problem of high learning time complexity

Inactive Publication Date: 2012-07-25
DONGHUA UNIV
View PDF0 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, learning for each region has h

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Column-storage oriented area-level data compression method
  • Column-storage oriented area-level data compression method
  • Column-storage oriented area-level data compression method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] In order to make the present invention more comprehensible, a preferred embodiment is described in detail as follows.

[0019] The present invention provides a method for compressing district-level data based on column storage, the steps of which are as follows:

[0020] Step 1. For data stored by columns, the data in any column Ai logically corresponds to a data segment S i , S i ∈S, S is the collection of all data segments, each data segment is evenly divided into several areas, the area is a collection of a series of continuous blocks, and the data records of the column are stored in the blocks in sequence (hereinafter referred to as item);

[0021] Step 2. Define a set of statistical information, and the statistical information of the i-th district is recorded as a set T i ={t, o, r, s, a, d, n, c, l}, where t indicates the data type of the i-th area, o indicates whether the i-th area is sorted, r indicates the number of items in the i-th area , s indicates the n...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a column-storage-oriented data compression method. The column-storage-oriented data compression method is characterized by comprising a step 1, dividing data stored according to columns into a plurality of areas; a step 2, defining a group of statistic information for the data in the areas; a step 3, sequentially defining a group of statistic quantities for each area by the aid of the statistic information in the step 2, and realizing quantitative estimation according to characteristics of distribution of the data in each area; a step 4, computing a similarity factor between each two adjacent area according to the learned statistic quantities of the two adjacent areas; a step 5, sequentially computing a value of each statistic quantity for the first area in a column, and selecting the values by a compression method according to the statistic values by the aid of a and computing a stepwise selection method; a step 6, computing a similarity factor between each two adjacent areas according to the statistic values of the remaining i area, directly applying compression strategy of the adjacent previous area if the two adjacent areas are similar, selecting a mode according to the compression method in the step 5 and selecting the compression method again; and a step 7, compressing the current area according to the obtained compression method. The compression method is based on column storage, compression is carried out according to the areas, a high-efficiency compression strategy selection method is designed, and column-storage-oriented massive data management can be effectively supported.

Description

technical field [0001] The invention relates to a region-level compression method based on column storage. Background technique [0002] At present, the amount of data contained in analytical applications such as data warehouses has increased sharply. In order to improve the performance of read-optimized systems, people have begun to consider a storage method different from traditional row storage—column storage. The column storage technology stores data tables in units of columns, and the same attribute values ​​in data table records are stored together. When querying, only the required columns need to be read into the memory, which reduces the amount of read data and improves the query efficiency of the system. However, the amount of data that the data warehouse needs to process is very large, which causes a large amount of I / O during query. Due to the unbalanced development of CPU processing and disk access, I / O has become the bottleneck of the query. Therefore, reduci...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 乐嘉锦王梅夏小玲
Owner DONGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products