Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Massive data continuous analysis system suitable for stream processing

A large-scale data and analysis system technology, applied in the field of data analysis, can solve the problems of not being well adapted, not fully utilized, and lack of improvement, so as to ensure scalability and reliability, easy configuration and replacement, and improve response speed effect

Inactive Publication Date: 2012-07-04
HUAZHONG UNIV OF SCI & TECH
View PDF3 Cites 59 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, most of the integration is only at the interface level, and few of them are integrated at the architectural level.
[0004] For the existing system that integrates the MR framework and database from the perspective of architecture, there are still problems of incomplete integration and failure to make full use of the advantages of the two, and lack of improvement to the existing architecture, which cannot well adapt to various , fast data analysis needs
Problems such as the long data import process and the batch design of MR have not been well resolved

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Massive data continuous analysis system suitable for stream processing
  • Massive data continuous analysis system suitable for stream processing
  • Massive data continuous analysis system suitable for stream processing

Examples

Experimental program
Comparison scheme
Effect test

example

[0063] In order to verify the feasibility and effectiveness of the system of the present invention, the system of the present invention is deployed in a real environment, and a cluster environment built with five virtual machines is used for experiments on one server. The server is loaded with the VMWare Workstation virtual machine platform, 5 virtual machines are created on the server, and a Hadoop and database cluster consisting of a main control machine and 4 working machines are built. The detailed experimental environment configuration is shown in Table 1.

[0064] Table 1 Experimental environment configuration

[0065] server

host computer

Work machine 1

Work machine 2

Work machine 3

Work machine 4

CPU

4*4core

2core

2core

2core

2core

2core

Memory

24G

2G

2G

2G

2G

2G

disk

1T

100G

100G

100G

100G

100G

operating system

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a massive data continuous analysis system suitable for stream processing, which comprises a metadata management module, a query plan generation module, a data import task generation module, an increment processing module, an MR (MapReduce) message processing module and a database connection module, wherein the metadata management module is used for managing meta-information of data tables and databases; the query plan generation module is used for receiving a query request and generating an optimal query plan; the data import task generation module is used for receiving a data import request and generating a data import MR operation set; the increment processing module is used for incrementally committing data import and query operations to a Hadoop system in parallel; the MR message processing module is used for receiving a result of a Map or Reduce function of the Hadoop system and outputting the result to a Reduce end or the next operation; and the database connection module is used as an interface between the Hadoop system and the databases. According to the invention, the Hadoop system is used for organically organizing the databases in nodes and simultaneously executing data import and data query and a pipeline technology is used for improving the MR execution flow, so that the data query is executed in a continuous stream mode and the time of analyzing massive data is greatly shortened.

Description

technical field [0001] The invention belongs to the field of data analysis, and in particular relates to a large-scale data continuous analysis system suitable for stream processing. It is suitable for parallel analysis and calculation of large-scale data, and satisfies data analysis applications that require high query response time. Background technique [0002] With the advent of the era of big data, facing the problem of how to obtain valuable information from massive data, large-scale data analysis has become more and more important, and it also puts forward higher requirements for data analysis systems. The traditional method of using a single database management system (DBMS) for data analysis has been unable to support the ever-increasing data, nor can it meet the diverse and rapid data analysis requirements. [0003] The existing two main large-scale data analysis systems: Parallel DBMS (Parallel DBMS) and systems based on MapReduce (MR) framework have shortcomings...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): H04L12/24H04L12/26G06F17/30
Inventor 金海赵峰袁平鹏张冬洁
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products