Method and system for distributed calculating and enquiring magnanimity data in on-line analysis processing

A technology of online analysis processing and distributed computing, which is applied in computing, electrical digital data processing, special data processing applications, etc., and can solve problems such as computing and query tasks that have not studied MapReduce data cubes

Inactive Publication Date: 2008-05-21
SOUTH CHINA UNIV OF TECH
View PDF0 Cites 69 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the current literature does not study how to use MapReduce to deal with the calculation and query tasks of the data cube, and how many Map and Reduce tasks can make the data cube achieve a balance between storage space and query time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for distributed calculating and enquiring magnanimity data in on-line analysis processing
  • Method and system for distributed calculating and enquiring magnanimity data in on-line analysis processing
  • Method and system for distributed calculating and enquiring magnanimity data in on-line analysis processing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] Embodiments of the present invention will be further described below in conjunction with the accompanying drawings, but the present invention is not limited thereto.

[0028] As shown in FIG. 1 , the cluster system structure adopted by the present invention is mainly divided into a name node and a data node. The name node divides data into blocks, distributes data blocks to each node, and reads and writes data blocks, that is, manages data nodes and schedules distributed computing tasks; data nodes store data blocks and process Map computing tasks and Reduce computing tasks.

[0029] As shown in Figure 2, the process of processing large-capacity data sets on the cluster system shown in Figure 1 in the present invention is:

[0030] 1) MapReduce divides the large-capacity data set to be calculated into blocks, and the size of each block is equal to the size of the data set divided by the number of Map tasks, and distributes the data blocks to the nodes;

[0031] 2) The ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and system for distributed computing and querying massive data in online analysis and processing. The method uses a cluster system to perform distributed pre-calculation and querying on data cubes. Based on the MapReduce framework, the present invention divides large-capacity data sets into blocks and distributes them to each node through MapReduce, then the Map task on the node calculates a corresponding local closed cube for each data block, and finally starts the Map task on different nodes Parallel query is performed on each local closed cube, and the Reduce task merges the metric values ​​obtained from the query. The present invention can simply and effectively perform pre-calculation and query on-line analysis and processing of large-capacity data, greatly compress the storage space of the data cube, and can quickly respond to user queries.

Description

technical field [0001] The invention relates to a method and system for distributed pre-calculation and query in OLAP, especially for OLAP processing of massive data. Background technique [0002] OLAP is a research hotspot in recent years. It takes the dimensional model, that is, the data cube as the core, aims at analysis, and provides users with multi-perspective online data analysis through pre-aggregation technology. However, with the continuous development of the Internet and the increasing complexity of user needs, high-dimensional and large-capacity data will cause an information explosion in the data cube. How to effectively compress and quickly calculate it has become a major challenge for OLAP. [0003] Many data cube compression algorithms have been proposed by current researchers. Yannis Sismanis and others proposed Dwarf Cube in 2002, which eliminates spatial redundancy by identifying the same prefix and the same suffix. Laks V.S.Lakshmanany, Jian Pei et al. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 奚建清游进国陈虎张平建
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products