Unlock instant, AI-driven research and patent intelligence for your innovation.

Processing method for reading Kafka data based on Spark Streaming

A processing method and data technology, applied in the direction of electrical digital data processing, special data processing applications, relational databases, etc., can solve problems such as data loss, cache data loss, cache impossible recovery, etc., to achieve data loss guarantee and prevent data loss Effect

Active Publication Date: 2017-05-31
上海轻维软件有限公司
View PDF4 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

As a result, these cached data that have been notified to the data source but have not been processed are lost; 7. It is impossible to recover when caching, because they are cached in Exectuor's memory, so the data is lost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Processing method for reading Kafka data based on Spark Streaming
  • Processing method for reading Kafka data based on Spark Streaming
  • Processing method for reading Kafka data based on Spark Streaming

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0022] The processing method for reading Kafka data based on Spark Streaming provided by the present invention uses a relational database to create two database tables, which are respectively a scheduling table (control) and a failure record number table (fai lure). The scheduling table stores scheduling information, including scheduling number id, start time, end time, status, creation time and other information. The failure record table stores specific failure data record details, including failure record id, offset, topic (topic), Kafka node list and other information. Among them, the scheduling number id in the scheduling table is the primary foreign key relationship with the id of the failure record table.

[0023] In the process of connecting SparkStreaming to Kafka to read and process data, firstly, the createDirectStream method of SparkStreamin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a processing method for reading Kafka data based on Spark Streaming. The method comprises the following steps: S1), storing data into a topic by use of Kafka; S2), partitioning real-time input data streams into blocks with a time slice as a unit by use of Spark Streaming; S3), pre-setting Spark Streaming complement number scheduling time according to the Kafka data failure record number; S4), monitoring a Kafka data reading process based on Spark Streaming in real time, and S5), re-reading Kafka data by use of Spark Streaming. According to the method, the Spark Streaming complement number scheduling time is set according to the Kafka data failure record number, the reading process is monitored in real time, and the failure record number is re-read so as to complement number, so that the method is capable of relatively flexibly and conveniently achieving guarantee for no number losing.

Description

technical field [0001] The invention relates to a Kafka data processing method, in particular to a processing method for reading Kafka data based on Spark Streaming. Background technique [0002] Spark Streaming decomposes streaming computing into a series of short batch jobs. The batch processing engine here is Spark, that is, the input data of Spark Streaming is divided into pieces of data (Discretized Stream) according to the batch size (such as 1 second), and each piece of data is converted into RDD (Resilient Distributed Dataset) in Spark, and then The Transformation operation on DStream in Spark Streaming is changed to the Transformation operation on RDD in Spark, and the RDD is converted into an intermediate result and stored in memory. The entire streaming computing can superimpose the intermediate results according to the needs of the business, or store them in external devices. figure 1 Shows the entire process of Spark Streaming. [0003] Kafka is a distributed...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/13G06F16/1815G06F16/182G06F16/284
Inventor 程永新谢涛王仁铮
Owner 上海轻维软件有限公司