Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for processing large batch of data in real time based on Sparkstreaming

A technology of real-time processing and real-time data, applied in the field of data processing, can solve the problems of slow processing and Kafka data backlog, and achieve the effect of fast speed

Pending Publication Date: 2021-02-02
银盛支付服务股份有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the existing technology, the aggregation calculation of real-time processing streaming large batches of data will be processed too slowly, resulting in the problem of Kafka data backlog

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for processing large batch of data in real time based on Sparkstreaming
  • Method for processing large batch of data in real time based on Sparkstreaming

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0027] The idea, specific structure and technical effects of the present invention will be clearly and completely described below in conjunction with the embodiments and accompanying drawings, so as to fully understand the purpose, features and effects of the present invention. Apparently, the described embodiments are only some of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, other embodiments obtained by those skilled in the art without creative efforts belong to The protection scope of the present invention. In addition, all the connection / connection relationships involved in the patent do not simply refer to the direct connection of components, but mean that a better connection structure can be formed by adding or reducing connection accessories according to specific implementat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for processing a large batch of data in real time based on Sparkstreaming, and relates to the technical field of data processing. The method comprises the following steps: S1, pushing large-batch real-time data: a business process system generates large-batch business data in real time and pushes the large-batch business data to a kafka cluster in real time to forma kafka data queue; S2, performing ETL processing of the data: the spark cluster performs data ETL processing by consuming kafka cluster data; S3, configuring an Apolo integration Sparkstreaming program, wherein the step of configuring the Apolo integration Sparkstreaming program comprises a step of establishing an Apolo tool in a cluster, a step of configuring the Apolo tool to integrate the Sparkstreaming program, and a step of using the Apolo configuration; S4, writing the Sparkstreaming program into a mongodb after calculation, and after a kafka data queue is consumed through a real-time program, writing a calculation result into the mongodb for storage; S5, submitting the sparkstreaming program to a spark cluster to be executed. The invention has the beneficial effects that the SparkStreaming framework is adopted to process a large amount of data, the speed is high, and the problem of kafka data overstock is avoided.

Description

technical field [0001] The present invention relates to the technical field of data processing, and more specifically, the present invention relates to a method for real-time processing of large batches of data based on Spark Streaming. Background technique [0002] SparkStreaming is a set of frameworks. SparkStreaming is an extension of the Spark core API. It can realize high-throughput real-time stream data processing with fault-tolerant mechanism. It is mainly used for micro-batch processing of real-time data and can process data at fixed time intervals. Spark Streaming receives real-time input data from various sources such as Kafka, Flume, and HDFS, and after processing, the processing structure is stored in various places such as HDFS and DataBase. [0003] In the prior art, when realizing real-time processing of aggregation calculations of streaming large batches of data, the processing will be too slow, resulting in a backlog of Kafka data. Contents of the inventio...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/25G06F9/54H04L29/08
CPCG06F16/254G06F9/546G06F2209/548H04L67/51
Inventor 李佳喜刘跃红管正爽黄位友
Owner 银盛支付服务股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products