Hadoop-based mass stream data storage and query method and system

A query method and query system technology, applied in the field of massive data management

Inactive Publication Date: 2011-03-30
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF4 Cites 83 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It can solve the problem of high-efficiency statistical analysis faced by massive flow data applications

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hadoop-based mass stream data storage and query method and system
  • Hadoop-based mass stream data storage and query method and system
  • Hadoop-based mass stream data storage and query method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0078] In order to make the purpose, technical solution and advantages of the present invention clearer, the Hadoop-based mass flow data storage and query method and system of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0079] The Hadoop-based mass flow data storage and query method and system of the present invention proposes a segment-level column cluster based on a distributed file system (Hadoop Distributed File System, HDFS) for the time-ordered characteristics of structured flow data The storage structure and the SCANMAP optimization mechanism use the summary information recorded in the segment-level column cluster storage structure to quickly filter the data to improve query processing efficiency. At the same time, using column storage ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Hadoop-based mass stream data storage and query method and a Hadoop-based mass stream data storage and query system. The method comprises the following steps of: constructing a segmented column cluster type storage structure; sequentially storing stream data as column cluster records, compressing the column cluster records from front to back to obtain compressed data pages, writing each compressed data page into a piece of column cluster data, and simultaneously additionally writing the page outline information of the compressed data pages into the tail ends of the column cluster data to obtain an integrated data segment; and in the process of executing query statements, constructing a scan table according to filtering restraints by utilizing the page outline information at the tail ends of data segments to quickly filter the data.

Description

technical field [0001] The invention relates to the field of massive data management, in particular to a method and system for storing and querying massive stream data based on Hadoop. Background technique [0002] With the automation of data generation, more and more applications require persistent storage of these continuously growing streaming data for subsequent query analysis and data mining, which poses severe challenges to the management of massive streaming data. challenge. [0003] In terms of storage, the total amount of stream data generated is large, and the daily traffic is also large. Taking domestic Internet companies as an example, about 5TB of web page click stream data is generated every day, with a total of more than 600 million records. These click stream data need to be persistently stored in the system and used for statistical analysis of the reports of the day (such as calculating the page click rate (PV) and user visits (UV) of the day, etc.), as we...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 郭斯杰熊劲
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products