Method for processing data offline in real time on the basis of Spark big data frame

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
A real-time processing and big data technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem that offline data and real-time data cannot be processed at the same time, and achieve strong scalability and fault tolerance, data processing Efficient and perfect effect of the process

Inactive Publication Date: 2018-11-23

SOUTH CHINA UNIV OF TECH

View PDF4 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The main purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and provide a method for offline real-time processing of data based on the Spark big data framework, which can effectively overcome the problem that other current big data platforms cannot simultaneously process offline data and real-time data insufficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0038] The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

[0039] Such as figure 1 , 2 , 3, a method for offline real-time processing of data based on the Spark big data framework, comprising the following steps:

[0040] Step 1. Offline real-time data collection

[0041] Build the Spark platform environment on the machine and install Mysql, Flume and other related software. Configure Flume-related configuration files to transmit data in Avro mode. Each machine runs a Flume agent, and a Flume agent contains multiple Sources and Sinks, and the Channel serves as the channel connecting the two.

[0042] Step 2. Data storage and caching

[0043] Source collects data from the data source and transmits it to Channel, Sink collects data from Channel and outputs it, the output offline data is uploaded to the HDFS distributed file s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for processing data offline in real time on the basis of a Spark big data frame. The method comprises the following steps that: through Flume, collecting data from a data source; uploading collected offline data to an HDFS (Hadoop Distributed File System) in an offline module; uploading real-time data to a Kafka cluster in the real-time module; preprocessing data on the HDFS by Spark; preprocessing data in the Kafka cluster by Spark Streaming; importing the preprocessed offline data into a Hive data warehouse; importing the preprocessed real-time data into a Mysql database; in the Hive data warehouse, developing and analyzing the offline data, transmitting a result to a front-end webpage to realize data visualization; and in the Mysql database, transmittingthe real-time data to the front-end webpage to realize data visualization. By use of the method, efficient calculation service and a reliable storage system are provided for data processing business,and various data processing requirements can be met.

Description

technical field [0001] The invention relates to the field of big data processing and analysis, in particular to a method for offline real-time data processing based on the Spark big data framework. Background technique [0002] Nowadays, with the rapid development of network science and technology, people's visits to the Internet are also increasing, and the amount of data generated with user behavior is also showing exponential growth. The behavioral data generated by users on the Internet every second reaches hundreds of gigabytes. It can be said bluntly that we have entered the era of comprehensive data. There are multiple application platform frameworks in big data, but each big data platform framework has its own advantages and disadvantages and application scenarios. Currently, Hadoop is the most commonly used framework for offline batch processing. It has good stability but The speed is slow and the data cannot be processed in real time. Spark is a next-generation b...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06F17/30

Inventor柯峰梁烜彰

OwnerSOUTH CHINA UNIV OF TECH

Method for processing data offline in real time on the basis of Spark big data frame

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology