An offline real-time data processing method and system based on a big data framework

A real-time processing and big data technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of inability to process real-time data and the inability of the Storm platform to process offline data, etc., to achieve strong scalability and fault tolerance , The effect of efficient and perfect data processing

Inactive Publication Date: 2018-12-11
SOUTH CHINA UNIV OF TECH
View PDF5 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The main purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and provide a method for offline real-time processing of data based on a big data framework. The big data platform framework can be used in the Internet to process network user query log analysis, social network user behavior Analysis, e-commerce network user behavior analysis, etc., can effectively overcome the shortcomings of Hadoop platform that cannot process real-time data and Storm platform that cannot process offline data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An offline real-time data processing method and system based on a big data framework
  • An offline real-time data processing method and system based on a big data framework
  • An offline real-time data processing method and system based on a big data framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047]The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0048] Such as figure 1 , 3 , 4, a method for offline real-time processing of data based on a big data framework, comprising the following steps:

[0049] Step 1. Offline real-time data collection

[0050] Build the Hadoop platform and Storm platform environment on the machine and install Mysql, Flume and other related software. Configure Flume-related configuration files to transmit data in Avro mode. Each machine runs a Flume agent, and a Flume agent contains multiple Sources and Sinks, and Channel serves as the channel connecting the two.

[0051] Step 2. Data storage and caching

[0052] Source collects data from the data source and transmits it to Channel, Sink collects data from Channel and outputs it, the output offline data is uploaded to the HDFS distributed...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an offline real-time data processing method based on a big data framework, which comprises the following steps: collecting data from a data source through a Flume; uploading the collected offline data to an HDFS distributed file system in the offline module; uploading the real-time data to a Kafka cluster in the real-time module; Mapreduce preprocessing the data on HDFS; Storm preprocessing the data on the Kafka cluster; importing the preprocessed offline data into a Hive data warehouse; importing the preprocessed real-time data into a Mysql database; developing and analyzing offline data in the Hive data warehouse, and transmitting the result to front-end web page to realize data visualization; in the Mysql database, transmitting the real-time data to the front-endweb page to realize the data visualization. The method realizes the simultaneous processing of the offline data and the real-time data, and can meet the requirements of a plurality of data processing.

Description

technical field [0001] The invention relates to the field of big data processing and analysis, in particular to a method and system for off-line real-time data processing based on a big data framework. Background technique [0002] Nowadays, with the rapid development of network science and technology, people's visits to the Internet are also increasing, and the amount of data generated with user behavior is also showing exponential growth. The behavioral data generated by users on the Internet every second reaches hundreds of gigabytes. It can be said bluntly that we have entered the era of comprehensive data. There are multiple application platform frameworks in big data, but each big data platform framework has its own advantages and disadvantages and application scenarios. Mapreduce in Hadoop is a framework specially used for offline batch processing, which has good stability but speed Slow and unable to process data in real time. Storm makes up for the problem that Ma...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor柯峰梁烜彰
OwnerSOUTH CHINA UNIV OF TECH