Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

OLAP (on-line analytical processing) data storage and query method based on Hadoop

A query method and data table technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of difficult to deal with frequent changes in query modes, expensive time overhead, high I/O overhead, etc., to achieve convenient and flexible use , easy to expand, and reduce time and hardware costs

Inactive Publication Date: 2013-10-23
SOUTHEAST UNIV +1
View PDF4 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional OLAP solutions store data by row, and need to scan the entire table when reading data in some columns. The additional I / O overhead is high, and it increases with the increase in the amount of data.
Secondly, with the continuous expansion of OLAP application data and the complexity of user query requests, the volume of input and output of aggregate calculations is increasing, and the complexity of calculations is increasing. Traditional aggregation calculation methods need to consume a lot of resources, and the time overhead is also very expensive. , and lack of flexibility, it is difficult to cope with frequent changes in query patterns

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • OLAP (on-line analytical processing) data storage and query method based on Hadoop
  • OLAP (on-line analytical processing) data storage and query method based on Hadoop
  • OLAP (on-line analytical processing) data storage and query method based on Hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Below in conjunction with specific embodiment, further illustrate the present invention.

[0028] The present invention provides the storage and query method of OLAP mass data based on Hadoop, comprising steps as follows:

[0029] Step 100: define the column file storage format as HCFile, use the HCFile format, and store the data table by column;

[0030] Such as figure 1 As shown, HCFile consists of data files and index files. A data file contains a file header and multiple data packets. The file header records metadata such as the file version, compression algorithm, and column data type. Data packets contain a fixed number of records. Records are of variable length, so data packets are of variable length. An HDFS data block usually contains multiple data packets. An index file consists of a primary index, a secondary index, and a file trailer. Each data packet generates a primary index, and the primary index records the starting position and length of the data...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an OLAP (on-line analytical processing) data storage and query method based on Hadoop. For the data storage, firstly, a new column file storage format HCFile (Hadoop column file) is defined, and then, a datasheet storage method based on the HCFile is given. In the scheme, when a column of data is read, only a plurality of HCFile needs to be read, the visit to other columns of data is not needed, and the I / O (input / output) efficiency is greatly improved than that of the storage according to lines; and meanwhile, when one column of data is added, only new files need to be added, and the extension is very easy. For the aggregation computation, firstly, the data index based on the inverted structure is created, then, MapReduce is utilized for realizing the basic aggregation computation of the OLAP, the basic aggregation computation comprises summation, maximum / minimum value computation, counting and the like, other aggregation computation can be realized by the basic aggregation computation, and the aggregation computation performance is obviously improved through the efficient data index. Compared with the prior art, the OLAP data storage and query method has the advantages that the data storage and query efficiency is effectively improved, in addition, hardware resources are saved, the time and the hardware cost are reduced, and meanwhile, the application is more convenient and flexible.

Description

technical field [0001] The invention belongs to the field of mass data management, in particular to a Hadoop-based OLAP data storage and query method. Background technique [0002] First the abbreviations and terms used in the present invention are explained: [0003] OLAP: OnlineAnalyticalProcessing, online analytical processing; [0004] Hadoop: a distributed system infrastructure; [0005] HDFS: HadoopDistributedFileSystem, Hadoop Distributed File System; [0006] HCFile: HadoopColumnFile, Hadoop column storage file; [0007] MapReduce: a parallel programming framework; [0008] With the continuous development of information processing technology and database technology, all walks of life have gradually established their own information processing systems. With the passage of time, enterprises have accumulated a large amount of historical data, and their scale has grown to terabytes or even petabytes, and the growth is still accelerating. In today's increasingly fie...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 宋爱波宋爱美李龙生
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products