Method for processing data offline in real time on the basis of Spark big data frame

A real-time processing and big data technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem that offline data and real-time data cannot be processed at the same time, and achieve strong scalability and fault tolerance, data processing Efficient and perfect effect of the process

Inactive Publication Date: 2018-11-23
SOUTH CHINA UNIV OF TECH
View PDF4 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The main purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and provide a method for offline real-time processing of data based on the Spark big data framework, which can effectively overcome the problem that other current big data platforms cannot simultaneously process offline data and real-time data insufficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for processing data offline in real time on the basis of Spark big data frame
  • Method for processing data offline in real time on the basis of Spark big data frame
  • Method for processing data offline in real time on the basis of Spark big data frame

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0039] Such as figure 1 , 2 , 3, a method for offline real-time processing of data based on the Spark big data framework, comprising the following steps:

[0040] Step 1. Offline real-time data collection

[0041] Build the Spark platform environment on the machine and install Mysql, Flume and other related software. Configure Flume-related configuration files to transmit data in Avro mode. Each machine runs a Flume agent, and a Flume agent contains multiple Sources and Sinks, and the Channel serves as the channel connecting the two.

[0042] Step 2. Data storage and caching

[0043] Source collects data from the data source and transmits it to Channel, Sink collects data from Channel and outputs it, the output offline data is uploaded to the HDFS distributed file s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for processing data offline in real time on the basis of a Spark big data frame. The method comprises the following steps that: through Flume, collecting data from a data source; uploading collected offline data to an HDFS (Hadoop Distributed File System) in an offline module; uploading real-time data to a Kafka cluster in the real-time module; preprocessing data on the HDFS by Spark; preprocessing data in the Kafka cluster by Spark Streaming; importing the preprocessed offline data into a Hive data warehouse; importing the preprocessed real-time data into a Mysql database; in the Hive data warehouse, developing and analyzing the offline data, transmitting a result to a front-end webpage to realize data visualization; and in the Mysql database, transmittingthe real-time data to the front-end webpage to realize data visualization. By use of the method, efficient calculation service and a reliable storage system are provided for data processing business,and various data processing requirements can be met.

Description

technical field [0001] The invention relates to the field of big data processing and analysis, in particular to a method for offline real-time data processing based on the Spark big data framework. Background technique [0002] Nowadays, with the rapid development of network science and technology, people's visits to the Internet are also increasing, and the amount of data generated with user behavior is also showing exponential growth. The behavioral data generated by users on the Internet every second reaches hundreds of gigabytes. It can be said bluntly that we have entered the era of comprehensive data. There are multiple application platform frameworks in big data, but each big data platform framework has its own advantages and disadvantages and application scenarios. Currently, Hadoop is the most commonly used framework for offline batch processing. It has good stability but The speed is slow and the data cannot be processed in real time. Spark is a next-generation b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 柯峰梁烜彰
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products