Message fault tolerance method and system based on Spark stream computing framework

A stream computing and message technology, which is applied to the redundancy in the operation for data error detection, calculation, error detection/correction, etc. It can solve problems such as program error reporting and repeated message processing, and achieve high reliability and reliable design principles. , highlight the effect of substantive features

Active Publication Date: 2020-10-09
SUZHOU LANGCHAO INTELLIGENT TECH CO LTD
View PDF1 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Because when the checkpoint is persisted for the first time, the entire related package and related configuration will be serialized into a binary file, which will be restored every time it is restarted, but when th

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Message fault tolerance method and system based on Spark stream computing framework
  • Message fault tolerance method and system based on Spark stream computing framework
  • Message fault tolerance method and system based on Spark stream computing framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0071] Such as figure 1 As shown, the present invention provides a message fault-tolerant method based on the Spark stream computing framework, comprising the following steps:

[0072] S1. Create a workflow program, set a metadata checkpoint task, and save the streaming computing information added to the program and configuration version label to the storage system;

[0073] S2. Set up the Spark flow to read the message from the message queue and perform message partition processing to generate a data set, complete the data set conversion operation in the Spark flow and create a logical execution plan;

[0074] S3. When there is a state transition in the Spark flow and a data checkpoint, set the data set for stateful transition processing, and add program and configuration version tags to the data checkpoint task;

[0075] S4. When the system fails, set the Spark flow to obtain the program and configuration version information in the checkpoint, and when the program and confi...

Embodiment 2

[0078] Such as figure 2 As shown, the present invention provides a message fault-tolerant method based on the Spark stream computing framework, comprising the following steps:

[0079] S1. Create a workflow program, set a metadata checkpoint task, and save the streaming computing information of the program and configuration version label to the storage system; the specific steps are as follows:

[0080] S11. Create a workflow program;

[0081] S12. Add program and configuration version tags to the streaming computing information;

[0082] S13. Set the metadata checkpoint task to save the streaming computing information to the storage system; the streaming computing information saved by the metadata checkpoint task includes creating the configuration of the streaming application, defining the discrete streaming operation set of the streaming application, and the job queuing but unfinished batches;

[0083] S2. Set up the Spark flow to read the message from the message queue...

Embodiment 3

[0105] Such as image 3 As shown, the present invention is a kind of message fault-tolerant system based on Spark stream computing framework, comprising:

[0106] The metadata checkpoint version adding module 1 is used to create a workflow program, set the metadata checkpoint task and save the streaming computing information of the adding program and configuration version label to the storage system; the metadata checkpoint version adding module 1 includes:

[0107] Workflow program creation unit 1.1, used for creating workflow programs;

[0108] The first label adding unit 1.2 is used to set the adding program and configuration version label in the streaming computing information;

[0109] The metadata checkpoint saving unit 1.3 is used to set the metadata checkpoint task to save the streaming computing information to the storage system;

[0110] Spark flow working module 2 is used to set the Spark flow to read messages from the message queue and perform message partition p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a message fault tolerance method and system based on a Spark stream computing framework. The method comprises the following steps: creating a workflow program, setting a metadata check point task, and storing streaming computing information added into the program and a configuration version label into a storage system; setting a Spark stream to read messages from the messagequeue to obtain a data set, completing data set conversion operation in the Spark stream, and creating a logic execution plan; when state conversion exists in the Spark stream and a data check pointexists, setting a data set to perform state conversion processing, and adding a program and a configuration version label into a data check point task; when the system breaks down, setting the Spark stream to obtain the program and configuration version information in the check point, and when the program and configuration version information changes, starting a redo task and the check point, andwhen the program and configuration version information does not change, restarting the program; and starting a new version program, and performing subsequent data recovery processing through the new version check point.

Description

technical field [0001] The invention belongs to the technical field of stream computing fault tolerance, and in particular relates to a message fault tolerance method and system based on a Spark stream computing framework. Background technique [0002] SparkStreaming is a set of frameworks, an extension of Spark core API, which can realize high-throughput, real-time stream data processing with fault-tolerant mechanism. In this patent, Spark Streaming is referred to as Spark Streaming for short. [0003] Streaming applications must run 24*7, and thus must be resilient to failures unrelated to application logic (e.g., system failures, JVM crashes, etc.). To do this, SparkStreaming needs to point enough information to a fault-tolerant storage system so that it can recover from failures. [0004] SparkStreaming provides a checkpoint mechanism for message recovery processing. Checkpoints have two types of data. One is the metadata checkpoint - saves the information defining t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F11/14
CPCG06F11/1489G06F11/1438
Inventor 魏健赵波
Owner SUZHOU LANGCHAO INTELLIGENT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products