Parallel data reflow method under stream computing environment

A data and external data technology, applied in the information field, can solve problems such as increasing system complexity, frequent Tuple replay, increasing system load, etc., and achieve the effect of improving system responsiveness, priority processing, and fault tolerance

Active Publication Date: 2017-09-12
ZHEJIANG UNIV OF TECH
View PDF7 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] In order to overcome the deficiency that the stateful data processing method implemented in the existing real-time stream computing to solve data fault tolerance increases the complexity of the system. In addition, when the Topology is overloaded, Tuple shows fr

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Parallel data reflow method under stream computing environment
  • Parallel data reflow method under stream computing environment
  • Parallel data reflow method under stream computing environment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] In order to make the above-mentioned features and processes of the present invention more comprehensible, the following specific embodiments are described in detail with reference to the accompanying drawings.

[0040] refer to Figure 1 ~ Figure 4, a parallel data backflow method for real-time stream computing, using Apache Storm as the real-time stream computing system and Apache Kafka as the data queue. The spouts in Apache Storm are divided into reliable spouts and unreliable spouts. Reliable Spout implements at-least-once semantics. It will resend failed Tuples to ensure that each Tuple is processed at least once. It is a stateful implementation of data; unreliable Spouts implement It is at-most-once semantics, and it will not process Tuples that fail to be sent. Because the data sending method nextTuple() and data confirmation method ack() / fail() of Spout in Storm in Storm are serially called in the same thread, where ack() is the function called when the Tuple ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a parallel data reflow method oriented for real-time streaming computation. The method comprises the steps that step 1, initialization of three queues; step 2, initialization of a piping Data Queue; step 3, read requests are initiated by Spout of Topology to the Data Queue; step 4, data in the three queues is read by Data Queue; step 5, whether or not the queue pointed by ToP is empty is determined, if the queue is empty, step 6 is proceeded; if the queue is not empty, step 7 is proceeded; step 6, the data in the From queue is copied to the To queue, and the From queue is cleared; 7, data in the Data Queue is obtained by Topology, a Tuple is sent by current Task to downstream; step 8, the feedback of Tuple awaits for being sent by current Task, if the sending fails or times out and the feedback is not sent, the Tuple is opted to reflow; 9, whether or not the Topology can be stopped is determined, and if the Topology cannot be stopped, then step 4 is proceeded, otherwise, the steps are ended. By the parallel data reflow method oriented for real-time streaming computation, the data is stateless and has fault-tolerance, data computation latency is reduced, system response is increased, and the reflowed data is processed by priority at the first possible chance.

Description

technical field [0001] The invention relates to the field of information technology, in particular to a parallel data reflow method in a streaming computing environment. Background technique [0002] From social network information (to provide hot topics or real-time search) to advertising processing data engines, real-time stream computing is widely used in today's industry, such as Apache Storm, Twitter's Heron, Apache Flink, SparkStreaming, Samza, etc. In these systems, the generation of data is completely determined by the data source, and the dynamic change and inconsistency of the data source cause the rate of the data flow to present a bursty characteristic, and the bursty characteristic of the data flow often leads to the occurrence of overload , There are several reasons for overloading: network congestion, high resource utilization, interference, heterogeneity, IO high-frequency blocking, etc. Therefore, in real-time stream computing, overload is common and unavoi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/38G06F17/30
CPCG06F9/3851G06F16/27
Inventor 陆佳炜陈烘周焕马俊高燕煦李杰卢成炳徐俊肖刚
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products